Lab 28:- Buffer Size Tuning (SSIS)

MSBI Step by Step Training Lab 28:- Buffer Size Tuning (SSIS)
32 minutes
Share the link to this page
Copied
  Completed
In this videos we will explain about buffer size tuning in SSIS.

  Download

Transcript

In this lab, we will discuss about a very important topic from SSIS performance and that is buffer size tuning. Now, when data moves between SSIS components, they do not move sequentially they move in buffers means what? For example, over here you can see that I have two components here one is oily DB source and the other one is a conditional split component. Now, assume that this oily source is reading from a table and the table has let us say three records for assume that there are three records and these three records are read by the oledb source and sent to the conditional split component. So, these three records will not go sequentially that means, you know it will not send first first row and then second In third row, it will actually send all of these three records in one single buffer. So as SSIS developer, it is very important for us to know that how to size this buffer how to size this buffer optimally, I'll say and see that you know, maximum number of rows fit in.

So the buffer should not be too big, the buffer should not be too small. So how do we go and fine tune this buffer so that we can have better performance? Now, the first question that will come to your mind is that okay, one is that we have to size this buffer. But the second thing is that where do we go inside this buffer? What is the default value of this buffer? How many rows can fit into a buffer?

Where is this things defined? So the thing is all defined in the data flow tab, so go to the data flow task, right click and go to properties. When you go to the property window, you will find that there are two properties here. One is the default buffer max rows, and the other one is the default buffer size. The default buffer max rows is the number of rows which can fit into a buffer. And the default buffer size is the size of a single buffer in bytes.

So you can see over here, the default buffer size you know what you see over here is nothing but it is a size in bytes. And in this one single buffer, the SSIS would like to fit 10,000 rows. So now take this simple situation here. I have an extremely simple table here with two fields, which has a size of 4000. So each one of these fields over here has a size of 4000. So the so one row size of this table is 8000 bytes and This table has approximately 433735 rows right.

So, in other words, let me just open an Excel sheet over here for our calculation purpose. So, I will say that okay single one row size is 8000 bytes and total rows is 433 75 right and the total buffer size by default is this 10485760 and the total number of rows which should be fitting into this buffer size is 10,000. So, this is the current situation we have right. So, let us say that we want to go and read all of these records into this conditional split here, right. So, let us see, you know in this current situation what will happen Now, before we proceed, let us do a bit of maths here. So, if one row size is 8000 bytes and total rows are 44 33735 that means the total size is number of rows one row size into number of rows.

So, this is a total data what needs to be moved from one component to the other component. So, that means, it is the total size then let us see that how many buffers we will need right. So, total number of buffers needed. So, if you see this year, it will be this divided by this. So, that means we will need approximately three 33 330 buffers in total to be created to travel the data from one component to the other component. And also we will calculate actual number of rows that will go in one buffer.

For example, over here it is saying that okay 10,000 rows should go in a single buffer, but will it so what I'm going to do is in order to achieve arrive, I'm sorry, to the actual number of rows, what we need to do is we need to divide the total number of rows with the number of buffers right? So we can see that the actual rows which will go into one buffer is 1310. And over here, he's saying that please try to fit in 10,000 which actually cannot happen because the one row size here is 8000 bytes. If you see here, if that buffer sizes this and total number of rows that has to be fit is 10,000 that means you know the size that was expected expected one rule size was this buffer size divided by number of rows. So, that means he was expecting approximately 1000 bytes and here you are eight times more right.

So, definitely the actual number of rows cannot be 10,000 it is just 1310 that is 1300 rows okay. So, this is the current scenario we have right. So, let us do like this, let us go and execute our program. And let us see that you know if whatever we are seeing here is working the same way with the program or not. Now, in order to confirm our calculation what we did right you know where we concluded that approximately 1300 rows will go into one buffer We'd like to go and see that is that really happening over here or not. So for that, what we will do is we will go to the control flow, and we will enable logging for data for the buffer size, right.

So I'm going to go here to the logging year. And I will say that yes, I want to go and add a new log here. So on the package level, I would like to go and add logs to my text file. And these logs I want to write it on my C drivers bear. So I will say always create a new file. Okay, so don't overwrite on the existing one.

So what I will do is in my must be in my D drive here, I will create a file here called as a txt file here called as buffer size right? I would like to write my buffer size values used into this text file here. Now, remember that this buffer size happens on the data flow tab. So, you can see here on the package level you can go and you can log a lot of events here. But the buffer size actually buffer sizing actually happens on the data flow task right. So, click on this buffer size on the data flow task menu here and you can see that there is something called as the buffer size tuning checkbox.

So go and check this text checkbox and say OK, so by doing so, what will happen now, you know, you are now going to log in the size of the buffer that is used by your SSIS package. Remember, the buffer size tuning is not available at the package level it is available at the data flow task level. So do not make a mistake here or else you know, you could be very confused you know where this checkboxes right so that is good. All set and let us go and run the program. So I'm going to go and run this program in a debug mode. Remember at this moment I am running this program with 10,000 rows and with the default with a buffer size of 10,000 rows what I have here and the default buffer size 10485760 right.

So, let us go and start running and let us try to see that what happens here right So, you can see now and remember the number of records are very, very high at this moment. So, this will take some time to run. You can see that it is it is running over here and let us see that how much time it takes and does it match or calculation of not so that you can see that he has loaded the records for 33735 good. The first thing what I would like to do is you know before I go and check the buffer size, I would like to see that how much time it has taken in has taken approximately 11 seconds and 43 it has taken 11 point 4311 seconds approximately right. So, what I will do is I will go here and I will see that time taken with this kind of buffer size.

So, time taken was 11 point 43 right, this is with a buffer size of this remember this value we are getting with a buffer size of this. Now, the first thing I would like to go and confirm my calculation here that how much records did he pass in a single buffer right? So, let us go to our D drive. So, I'm going to go to my D drive here. And let's open this file year and if you go and scroll you're safe if you see this, you can see that it is saying that There will be only 1308 rows in the buffer. That means he's saying that you said that you want to fit 10,000 rows.

But you know, the maximum rows, which is fitting here is 1308. I do understand, you know, when I look at my calculations, you can see that we actually estimated 1310 So, don't worry too much about that two bytes, okay, it's fine. But what I'm saying is that whatever we thought in our calculation, the same thing is actually happening on real time okay. So, that is good. But now think about it. So, that means that the actual rows which are getting transferred are 1310.

And you know, the capacity what you wanted to transfer was 10,000. So, that means that you know, you are not utilizing the buffer properly, right. So, in other words, you know, the other thing probably which can come to your mind is Should we go and increase this buffer size? You know, so that probably we can reduce this time, right? Or should we probably go and minimize the buffer time, the buffer size so that we can improve performance? Right?

So probably you'd like to see that you know, what is the best option here. So, let me take the first thing, what I will do is I will go and I will increase the size of the buffer and let us see that what will happen. So what I'm gonna do is I'm gonna go and copy this all thing here Ctrl C, and I will just paste it here. Ctrl V, okay. So my thought here would be to go and increase the buffer size. So this was with the default values, right what I ran, and the one which I will now do will be with the increased values, right?

So I'm going to go and increase the buffer size and see that what is the time taken. So now what we'll do is let us go and increase our buffer size eight times. Okay. So, if we do that then we would be able to transfer 10,000 rules in one buffer right which is actually better than this 1300 rules right. So, what I will do is I will take this and let us see that, do we get any performance benefits or not so, I'm to say that okay, I'm increasing it eight times and I am seeing that now I should be able to transfer 10485 which is nothing but 10,000 rows right. So let us go and run this program and let us see that how much time it takes.

So that it is executing and after this, we will see that how much time it has taken. Okay, that it does. And you can see that the time taken is more actually you can see it is 16 seconds. Which is bad, right? So, in other words by increasing the buffer actually I did not benefit. So, in this case, I will say that the time taken was 16 seconds, right?

So this was with this increased buffer. Now, what I'll do is let us go and reduce the buffer size. Let us see that if that works, so, I'm going to go and copy paste this again and this time I would like to have new sets of reading. So, in this case, I would like to say that I would like to reduce the buffer size. So must be let us make it something like eight right. So if I could make it let us make it that is too small.

Right? something better Okay, so finally Drew's right so you can see now what I've done is I just in the buffer size, and I'm trying to send half of the rules, you know what I've sent previously right? So let me go back here. And let us see that with this will I have an improvement? So first I increase the buffer, something happened after that I decreased the buffer of decrease the buffer. So let's see what happens.

So I'll say okay, I'm gonna go and put this and the expected rose, your is 560. So I'll say that I'm expecting like 560 or whatever, you know, 570 rules to be transferred. Right? With this size. And let us see what happens. So that I'm running you can see now it is running more faster as compared to previous one because now in one buffer size, we are transferring small amount of records, right?

So that's why it is moving faster. But do it This thing also the number of buffer size increases right the number of buffers increases and you can see this till 16 seconds right. So, that means, so over here also I will say this is 16 seconds, okay. So, you can now see that, you know, by adjusting the buffers, you know, I'm trying to see that what is the best way to achieve my goal to achieve my performance, but you know, I can understand that you know, over here if I just keep my default buffer size everything would be fine. So, the point here is is the point from this experiment what I was trying to make is that you need to see that what is the optimal buffer size for your program, okay. So, that was the that was the goal of this exercise.

Now, this is a very simple situation. Let me go and complicate this program. Let me put one more sort of component here and one more conditional split component here, right? So what I'm going to do is I'm going to go and put a sought component here, right? And if you go unsought So, in my conditional split at this moment, I'm saying that whichever lengths are greater than zero, just go ahead and over here, I will say that go and sort by both the values, okay. And after that, I would like to just go and put this whole thing into one more if condition here.

So let me again, put a conditional split. So I'm just making the program more complex at this moment. So I'd say that, check that if the length of the value is greater than zero, so again, I'm doing the same kind of check. So what I've done now is actually I am actually Am I've just complicated the program and I would like to see that what is happening right? And then we'll take the readings for both for you know for the three scenarios what we discussed here, so we'll actually go and first two with the buffer size of 10485760 then with the increased buffer size and then with the decreased one right, so, again, we will repeat this experiment and see that if that is really an improvement, right. So let us go and run the program here.

And let us see what happens. So now you can see here, a very important watch the screen now this is very, very important. What you're seeing can see the rows are going from oledb source to conditional split, and it is also going to the source, but you know after the sort the rows are not going ahead. So you can see from the conditional split. Some of the rules are moving ahead as they are processed, but from The SOT none of the rules are moving ahead what is happening over here, right. So, this is very important you know this thing what you're seeing on the screen where you know after the SOT The rules are not going right.

Now, remember that there are three kinds of components in SSIS. And these components you know do the processing in three different ways. The first type of component is termed as a blocking component, a blocking component is a one which you see like the salt. So, you can see the salt what it says is that, you know, I will process all the rules and then I will move ahead all the data So, till all the rules are not processed, nothing is going to move ahead. So, these kind of components are termed as blocking components. You can see the conditional split component here, you know, what it is doing is that as rows are processing he's actually just pushing it ahead.

So, this is called as the non blocking component and there is a third kind of a component called a semi blocking, where some of the rows are blocked, but some of the rows actually move ahead. So, remember three types of components one is a blocking component where the component blocks the complete flow and when the component finishes is processing, done only he gives the rules ahead. The second kind of component, you know, which is no blocking, you know, where as soon as the rule is processed, it is moved ahead. And the last one is semi blocking, you know, where some of the rows are blocked, but, you know, as the rules are processed, you know, the other rules which are processed, you know, their mobile move that right, so, so this is, again, very important, and we'll talk about more on these different kinds of component that is semi blocking and blocking as we move ahead in the SSI chapters, but at this moment, I think it is very important to highlight these three processing types.

So, let's Finish this experiment it's going to take a bit time here I'm going to go and pause the video until this thing finishes. But I hope that semi blocking blocking and no blocking is understood. So, let us wait for this experiment to complete remember, we are at this moment running with the low buffer size. So we are we are actually selected the low buffer size and we are doing this experiment here. Okay. So, you can see that that the loading has finished and it has taken nine minutes and 44 seconds.

So, remember that this is with the lowest value of buffer size Okay, so nine minutes and 44 seconds now what I'm going to do is I'm going to go and repeat this experiment for the other two guys as well. So remember that this one is with the salt component it has taken nine minutes and 42 seconds okay? Quickly, let me just see as 44 seconds, whatever. Okay? Now in the same way, what I'm going to do is I'm going to go and run this experiment again with the larger buffer size, and the mid buffer size. Again that and let me go and record those readings.

So I had, you know, I'm not going to go and repeat this experiment on life on the camera here. But I will just note down the readings, and then we can start looking at those readings. So there are my readings here. So readings for the buffer size with the default buffer size, with the buffer size increased and the width the buffer size decreased. And you can see here that the increased buffer size, it took me four minutes and 43 seconds, as compared to the default buffer size, it was 19 minutes. And with those with the small buffer size, it was nine minutes.

So you know with the increased number of buffer size, actually I'm benefiting here. Now, remember You know initially when I just use the if condition I was not seeing so much difference it was you know, just in seconds right. But you know, once I started putting the sort component and other complicated component that I'm seeing more difference in terms of the execution. So, in simple words, it is very important to see that the buffer size which we give over there is proper is optimal and we do not have issues right. Now, the next question which comes is that So, how do we calculate the buffer size or at least how do we ensure that you know whatever buffer at this moment is present the default buffer is good or not. So, here is a small excel sheet I have here you know wherein I'm using my way of deciding my buffers okay.

For example, here now, let us assume that your row size is 8000 byte So, Your one row size is 8000 bytes assume okay total total number of records or something okay. Now let us say the actual server RAM is 16 gb let us say let us assume that the actual server on which your SSIS package will run a 16 gb now, from that existing ram you don't give all the ram to by application right you know some ram you keep because you want other applications of the server to function properly. So, I will say okay I can allocate to SSIS, let us say, four GB RAM, okay, so four GB RAM. When I say I'm allocating, that means I'm allocating four GB of buffer size to SSIS. So the allocated buffer size now this is in GB and you can see below it is in bytes.

Okay. Now the total buffer needed for this row size and this record is this much and how, however right? I've arrived by multiplying the row size into the total number of records. So, you can see a normal allocated buffer in bytes is this much and the total buffer is this much. So, my my number of buffers will be whatever is the total buffer needed divided by this right. So, you know the number of buffers for transferring these many records are 1.5 that means approximately two and so, we also need to also calculate that in every buffer how many rows can be fit right.

So, rows which which can fit right number of rows which can fit. So, number of rows which can fit is nothing but the total number of records divided by the number of buffers right. So, we can see that we can fit In these many rules okay. So, in short from this calculation what we have arrived to is we have arrived to the default buffer size. So, this default buffer size is nothing but this one. So, this is the default buffer what you need to put over there right and we are alive to the second thing that is default macros right.

So, this is the macros. So, the whole point of this calculation is that we want to arrive to an optimal buffer size you know depending on how much RAM we are allocating to our SSIS package. So, in other words, you know do not just keep the default size you know most of the time the default size works because it is a very good number out there. But sometimes you know you feel that yes you know you have a very large record size and you have a lot of records moving from one component to another component, then I think you know one calculation, doing one calculation and ensuring that you are utilizing the Resources properly is very essential. So so that was a very, very crude method or I'll say you know, a method by which you can arrive to the buffer size and the number of rows.

Now, one more important point to remember here is that as we said that the buffer size matters a lot. So the buffer size should be so much, you know what you need, not more than that, for example, over here at this moment, you know, in this case, our total record size in SQL Server is 8000 bytes per record, right? So we need to ensure that at the end of the day, whatever records are coming into an SSIS, they actually get typecast it to the SSIS data type. So we need to ensure that the data types over here, you know, should match to your SQL Server. For example, you can see over here in the value, it is saying 4000 is the length that is a 4000 bytes, and it is DT underscore str. I'm sorry for that.

So that was it. So, it just so basically over here if you see this is right because yes you know my data type in SQL Server is carrying data type of 4000 bytes. And here it is DT underscore str. So that means it is a non Unicode byte right remember in one of our previous lectures, we discussed about this Unicode and non Unicode, but now, think about for example, sometimes what happens as you can see this is oledb source, let me quickly pull up here a do dotnet source very quickly you must be wondering that, why I did not use a do dotnet over here right. So, let me just quickly go here and pull up a do dotnet. So, if I go here, and let me pull up a dotnet source Now, look at this, this is very interesting.

Let me go here and take the same table, the TBL source and the same columns. Now, if you go and see the data type here, if you go and say Show Advanced editor and over here, if you see See, you can see that he's saying DT underscore Ws to look at this that means a doctored source even though you have defined a care data type in your table here he is taking as two bytes for one character right. So, that means that you know this has to be changed to DT str right or is what will happen, this will be not 4000 bytes it will be 8000 bytes So, 8000 year and 8000 to value to that becomes 16,000 bytes for one record okay. So, ensure that you know whatever is the structure of your record should be chosen in such a way that it should be the it should be that much you know how much you need and not more than that.

So, this is very important here. So, again, you know, you can see that I made a choice of oledb source because I saw that he's choosing the size properly. Right. So, that brings us to the end of this video. So in this video, we learned the following things we say SSI as follows a buffer oriented architecture and not a row oriented architecture. So, in other words you know, when data travels from one component to another component they travel in buffers and not in rows.

And if you want to go and address the buffer size and how many rows should fit in, then you need to go to the data flow task right click on it, and you can go and adjust these two properties that is default buffer macros and default buffer size. The third thing we discussed here is that you know, when components are processed in SSIS they are processed in three different modes you know, one is fully blocking, partially blocking and the last one is no blocking. So, whenever a component is getting processed, you know, it is processed in one of these three modes. Also, we said that, you know, one is that okay you have the source data type for example, in SQL Server, you have the data type, but the other thing we said that in SSIS, you know, they have their own data types, right. So sometimes it is very It's possible that you said over there in the source one byte, but here you know SSIS has chosen to bytes.

So, it is very essential that in order to use your buffer size in an optimal way, you need to go and check that if the column data types are in the proper format or not right. And the last thing and very important thing is that your project should have SSIS buffer calculation policy. So, there should be some kind of a policy or some kind of a calculation formula that says that how optimal buffer size should be arrived. So, these are the takeaways from this lessons and I hope that you enjoyed this video. And the next video what we will do is the next video I will again keep our SSIS video because we are doing performance tuning and everything is so connected. So, the next video would be on multi threading with SSIS.

So, in case you know when you're doing parallel tasking, how do you go about doing it? Right, so the next video is on SSIS threading Thank you so much.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.