Lab 41:- Lookup caching modes and Cache Transform.

MSBI Step by Step Training Lab 41:- Lookup caching modes and Cache Transform.
28 minutes
Share the link to this page
Copied
  Completed
In this video we will look into different caching modes available in the Lookup Control.

  Download

Transcript

So welcome to lab 41. In lab 41, we will look into different caching modes available in the lookup control. If you remember, in lab seven, I had covered the basics of the lookup control. So in case you want to know how to use the lookup control, etc, you can go and see lap seven. In this lab, my focus would be to first go through the three caching modes available. One is no cache, partial cache, and full cache.

Then after that, we'll go through when to use what so in what scenarios I should use the full cache in what scenarios or should use the partial cache and in what scenarios I should use the no cache. And also we'll go through one important control called as the cache transform control, which will help us to manage the cache cache centrally. So in this 30 minutes of video, I have a three point agenda first, going through the different modes of the cache lookup, right? Second to understand in what scenarios to use what and third one to go through the cache transform control. So let's start with look up caching modes in SSIS. So now here is a simple look of example, which we will be executing in this project.

So you can see that we have a simple text file here, which has the salesperson ID and the sales made per day. So you can see that we have salesperson ID two has made 100 sales in one month 2016. Then again, we have one which is in one to 2016. Again, salesperson two has made two sales in the following dates right. So now I want to go and take this say this text file and do a lookup into the table and take the salesperson name over here. So I want to go and create a report you know where I would like to see the salesperson name and the and the and the sales made per day right.

So let us Go ahead. And so let us go ahead to a project here, right. So you can see that I have created a very simple project. And there is a data flow here called as lookup sales. So I would like to go here and first look up the text file. So I want to go and first create a data source.

So let me create a data source, which is a flat file source. And this flat file source will read the text file, right? So this text file at this moment is I think he's in C drive with the name salesperson details, right? So I'm going to go here and browse to the salesperson details. txt file. So there it is.

Let's see this is CS person. Details. Browse. So we'll go and take this right. So from here, I'm getting the salesperson ID right. And this I will look up for the name in the database, right?

So I will go and say okay, now, I would like to go and look up into the database, right, so I'm going to go and take the lookup, right? And I will be. So in this video, our concentration is on cache mode. So you can see that we have three cache modes here in the lookup, the full cache mode, partial cache and no cache, right? So at this moment, I will be selecting no cache, we'll start with the most simplest one. So let us go to Connection Manager and let us take customer data warehouse.

So from this I will be loading the DMCA For some I will be looking up in the gym salesperson. So basically I'll be going and joining both of them and yeah there is a direct mismatch because this salesperson because you know in our database right the salesperson ID is in right and when we read from the flat file source it is normally it is normally not it is normally a string right. So, let us go to the flat file source, flat file source Connection Manager and let us make the salesperson ID to be read as a four byte sign in teacher white sign in teacher right right So there it is. So I'll go here now. And now my lookup should work. Right?

So dim sales person and lookup. So I want to get the salesperson ID, the sales date, the seals, the the seals made the seals date and from here I want to get the salesperson name from the database right okay. So, now, this lookup has been configured for no cash right. So, now what will happen? The SSIS package will execute. So when we run this package, it will actually execute and it will first run the flat file source.

So, from the flat file source, it will take the first record to write and look up into the table. It will then take the second record look up into the table right. So let us go ahead and run the SQL Server profiler and illustrate to understand that How exactly is this lookup behaving in no cash in terms of looking up into SQL Server, right? So in order to do that, I'm going to go and run SQL Server profiler. Now, in case you are new to SQL Server profiler, I would suggest to go and watch this video where we explain what exactly is a profiler and what it does, right. So I'm going to go and run SQL Server profiler.

So over here, I'm not interested to go and listen to audit logins, log logout or like, how many connections are made, right? So I'm not interested in that. Neither I'm interested in so many columns, you can see the lots of columns here. I'm not interested to see all of these columns, but yes, I'm interested to see that what kind of text data that is SQL is getting fired, right? So I'm going to remove all of this so that We can see the things more clearly. Right?

So that is, right. And what I'd like to also do is I'm only interested to hear the query is made to customer data warehouse, right? So to this database, so I'm going to go and also put one more column that is the database name, right? So I'm going to put the database name. And I'm going to go and apply a filter here saying that I'm only interested to hear from customer data warehouse and not from you know, everyone, right? So I'm gonna go and say run here.

So, there it is. You can see that it is now hearing two queries which have been fired on customer data warehouse, right. So let us go ahead and run this right so we'll go and run This and let us see what is happening in our profiler right. So, in our profiler, you can see first thing, if I scroll from the top there are some statements here you know which are like like describe and declared prepare right. So, let us get these statements because this prepare an unprepared is nothing but it is not exactly executing the SQL statements, but it is actually creating a handle and then later that handle is executed down below. So, basically, the SQL execution actually is happening over here So, you can see that there are four SELECT statements the first select statement is firing on to the second SELECT statement is firing on one, then again on two and again on three right.

So, this last four statements you know indicate that there were four SQL statements which were fired so you can see here when our a text file was read. So he took this too. So you can see the first SQL statement is saying your to, and it caught the name. Look at the second one, one. So again, he went here, we can see a one, then we have two years, it is two, then we have three is three. So basically, when you say it is no cash for every lookup, it's going to go to SQL Server and fire the SQL statement.

Right. So this option here indicates that irrespective of whatever, right it's going to always go to SQL Server and get the data right. But now think about it. You know, if you have a lot of repetitive sales person ID over here right? Then this can be a problem right? You are hitting unnecessarily right.

So is it possible that I can go into a cache. So I can go and say okay, like, let us first load all the master data. So master data means the salesperson, right? And then just look up from the cache, right? So for that we have the second option here. That's the second option that is full cash right.

So let me take this full cash here. And let us see that how it behaves right. So I'm going to go and Savior and let me go back and clear all of this. Let me run this. So now what I expect is that he should actually be using the cache right? So let's go here and you can see now there are a lot of prepare unprepared statement leave that but you can see that there is only one execute state Why do I see only one?

Let me just quickly check here I should be seeing some more statements okay. Check it again. So far yeah, I think you can see that that is okay. Basically Here it is. So, you can see it is saying here actually it was there you know I just missed it. So, we can see now here is only one SP execute statement and you can see this passing the value one.

So what is this one? This one is a handle right. So you can see in the previous statement, he has created a handle with the value one and this handle is nothing but it is actually fetching the full dim salesperson. So we can now see that rather than going and making Trip again and again, it is just loading the whole data once and then looking up from the cache, right. So, the first option, this option, it is no cache. So for every record, it will go and get the data from SQL right it will make a live query.

While in the second option, it will first go and load the full table right into the cache and then look up from the cache right. Now, the third cache mode here is the partial cache. So, what exactly is the partial cache? A partial cache you can think about is nothing but it is like a on the fly cache or I will say, a kind of runtime cache okay. So, what happens here in partial cache? So, first, your SSIS package runs and he reads the first record.

So, first thing is your cache will be empty. So, when you initially run the package There will be no cash right? So we won't first take the first record he will say okay salesperson ID two he will say okay does this salesperson ID to exist in my cache, if it does not exist, it will go and it will fire a SQL statement to the SQL Server, get that record and cache it and then run the and then pass the record to the down below. It will then take the second record one it will then go and again search in the cache. So in the cache, he will he will get you will get one one record that is two. So that means that one is also not existing in the cache.

So again, goes and files SQL statement and brings the record in the cache right. So in the cache, now, we have two and we have one right. Then it goes to the third record to he again goes and searches in the cache and this time he will find the record in the cache. So because he finds the record in the cache, he will now not fire an SQL statement. Do SQL Server, but take the data from the cache and execute it. So in short, you know, partial cache is nothing but you know, as the data comes in, it actually starts caching it builds up the cache, right?

So in short, you know, when we go and run this package, now, we should get one of the execute SQL statements, right? You know, one for two, right? 141 and four, three, this two will be taken from the cache, right? So let us see if this happens or not. So I'm going to go and clear my trace window. So let us go and run this and we'll see what happens here.

So in short, we should get three execute SQL statements okay. And you can see down below, right, you can see down here, first one is two, right. So basically we had the record too, so that is for it. Second, you can see in One. So that is for the second record. Again, that is two, but this time he does not go and fire into their squeal right?

He actually reuses from the cache. So the third one here is for the three. So here you can see now, he has only fired three times they squeal statement. Okay. And for this third record he has used from the cache. So partial cache means rather than loading the whole data into the cache, you load on demand.

So now the next question which comes to our mind is that in what scenario should I use the full cash, you know, when should I use a partial cash? And when should I use the no cash, right? So in order to understand the same, what I've done is I've created four criterias here, and these four criterias can be, you know, your decision making criteria as you know, by which you can decide when to use a partial cache and when to use a no cache and when to use a full cache, right? The first criteria is the size of the data, like how big is the data? Right? Is it like a million records, then definitely probably you don't want to cache a million records right?

How volatile is the data? Is it the master table? Or is it a transaction table right? So is it not too volatile right? Or is it constant right? So, is it too old tile or not too old tile?

The next thing is that, you know, where do you want the performance to be good? Like, do you want the performance to be good during the startup of the exercise package? Or do you want the performance to be good during the runtime right. So, first thing if the data is small, then I would like to go with a full cache. So load the cache once right and basically just start looking up from the loaded cache right. So in case of a full cash now in case of full cash, definitely you To startup performance will not be good right because the cash will be loaded during the startup right during the SSIS application is loaded right.

So, there will be an impact on the startup performance, but then the runtime performance would be great right. Second, Now, think about it like your data is small right, but it is too volatile. In other words, at the back end, the inserts and there are new records which are getting inserted. So, in this case, definitely full cash is not good, right. So, you would like to go for a partial cash, because new data is coming in, right. And in this case, you know, I think the performance of the startup and runtime will not be my decision making criteria because when the data is volatile at the back end, right, I do not have option but to use a partial cache, isn't it?

Now if the data is large now in case of lost data, I would really go for caching. But now let us say the data is large. And there are few records which are looked up, right. And I still want to do caching right, then in that case, definitely I will go for a partial cache. So partial cache will definitely have an impact during the runtime, right? Remember, partial cache will always have an impact during runtime.

Now, let's say that your data is large, and that is also too volatile. So you know, records are getting updated inserted now and then right, then definitely, there is no point of caching because it's better to read the data from SQL Server directly right? Or else you will end up with some kind of a defect. So in case the data is large, and it is too volatile, then I would go for a no cache, okay. Now, by default, if you don't want to get into this confusion, right, what I would say is that, start with about Gosh, because partial cache is on demand. So start with a partial cache and let the data fill, you know, as the lookup runs.

But in case you know, you are sure that there is a master table which is referenced aggressively and the master table is too small in size, which your RAM can handle, then definitely go for full cash right? So by default, if I'm if I'm sure that there is a lookup on a transaction table and that and I'm going to get a benefit I would go for a partial cash right. But in case I find like that is city master and country Master, which is referenced again and again and again, then I would like to go for a full cache. Now in SSIS, we have a very nice component here called as the cache transform. Now, this caching what we are doing at this moment, we are using oledb connection managers. You can see that there is something called as a cache Connection Manager also All right, so what is the difference between both of them, if you're using the oledb Connection Manager, and if you're selecting the cache mode, then the cache is applicable only for that data flow.

In other words, after this lookup, if I again do a lookup, you know and if you have selected oledb then the cash will be used, but now, let us say, I have two separate data flows. Let's say that this is lookup sales and there is lookup sales one like this. Now, in this case, what will happen is the cache which is created on this data flow will be used only in this data flow. And it will not be used across the whole control flow. So the this cache will not be used in this lookup sales. So if you see for example, let me go and clear the trace window and let us go and fire SSIS let us go and run SSIS.

So, if you see over here, you will find there are two SP execute statement. Remember, I have chosen full cash at this moment. And you know this prepare and prepare statements, you can just leave. But you can see that there are two SP execute statements. So the first one is probably for the first first control data flow, I'm sorry. And the second one is for the second data flow.

So now, definitely This is not good, right, what you would like to do is, you'd like to go and create a centralized cache, right? And then that cache, you want to go and use a reuse in all the data flows, right? And that's where we have something called as the cache. Transform, right? So what I'm going to do is I will go and I will create one data flow. And the whole purpose of this data flow is to configure the cache, right?

So I'm going to go here and create a data flow. And the goal of this data flow task is to load cache. So the cache gets loaded centrally. And after that, the lookups use this cache, right? So I'm going to go here. And let me select the dotnet source.

So remember, we are using the salesperson name has a lookup. So we're going to go and select salesperson, select columns. So I'm just interested in the person ID and the person name and I will say OK, right. And now let us go ahead and select the cash transform. So it's a new skill set, this is the sales person cash. Right?

Now, if you wish you can also do is you can go and you can load the cache into a file and this file gets stored in a where your SSIS package is running. So with this what will happen is you don't have to go to the source again and again, right, you can just take the data from the cache, isn't it? So I can just so let me use this so I'm gonna go and browse and create one file. So this file I can name it as country cash. So country cash, right. Good.

Now sorry, since cash, I'm sorry, sales person, cash. salesperson cache. So, I will say yes, I need all of this right. So you can see that automatically he has figured out the data type and that is good, right? And I need to create, okay. I think he will create one.

So let us leave it. So Uranus Okay, we need to have one index column. So the index column, you know, is like a primary key so that's so for the index column I've made one right. And all the other columns I made zero right? mappings Yes, that's it. Okay.

So now what I've done is I've created a separate cashier a cash transform you by the name, salesperson cash, and what we will do is, you know, henceforth in all the data flows right I'm going to go and use this cache. So, I will go here I will say Edit and I will say that I want to use the cache Connection Manager. So, in the connection, I would have chosen 6% cache and the other things are coming in same good. So, let's go here to this lookup sales and tell him also that do not use the oledb Connection Manager use the cache Connection Manager right seems for some cash, right columns. Good. Right.

So now if you see our cache is loaded centrally, first right and then it is reused again and again in all the lookups. So now let us go and see that do we see now multiple SELECT statements right, so I'm going to go into To the SQL Server profiler and let me run this. So, first the cache will be loaded and then the cache is reused again and again. So, let us go and see here and you can see that okay there are a lot of SQL state you can see that there are a lot of statements, but these statements are not SQL this FMP one only statement gets the data so, the data structure of the table right. So, all of the statements you can leave, but look at this statement here. Select star from dim salesperson, right and please note that these two are not different SQL statements, this is SQL batch starting and this is SQL batch completed.

So, basically it is just one SQL statement which is fired. So, you can see now, only one SQL statement is fired at the start of the cache and it is reused again and again right. Now, also one more thing about cache Connection Manager which I would like to bring to your note In that this cache Connection Manager can load from any source. So, you can see here like, you know, if you remember in the lookup if you see, we can just connect via oily TV right. So, we can just connect via oledb right and only DB can probably just connect to SQL server or can connect to our DBMS right. But now, let us say you want to load cash from XML file, let us say, right.

So, in that case, definitely you will go for a cash transform, right. So, you will go and you will read the XML file right by using the XML source or whatever right and then you will go and load the cached transform right. Now, one of the one of the is a problem or unseen limitations of cache transform is that it does not do any kind of partial caching, it is full caching, right. So basically, whatever you load here is available throughout, right. But I think, you know, the benefits of putting it in a file and all that, you know, definitely is welcome. Right.

So in case if you're thinking about full caching, then I would suggest that give a thought of using the cache transform. So I hope that this session was useful. In this session, we were trying to understand what are the different caching modes in the lookup control and when to use what and we also did the cache transform control in this video, right. So thank you very much.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.