Lab 7:- Lookup, Data conversion optimization and updating SSIS package.(SSIS)

MSBI Step by Step Training Lab 7:- Lookup, Data conversion optimization and updating SSIS package.(SSIS)
39 minutes
Share the link to this page
Copied
  Completed
In this video we will learn about Lookups , Optimizing datatype conversion and how to update changes in MSBI.

  Download

Transcript

So let us start with lap seven. And in lap seven, our goal is to load the fact table to load the fact customer table. So at this moment, our master tables or I'll say rather dimension tables to be more appropriate, like country dimension product dimension sales and states are getting loaded from the respect to CSV file. So now the final thing what is left is to load this customer fact table. Now, let me go ahead and switch back to a project. So the first thing, what we need to fix in this project is we need to fix the read from CSV because previously when we read from the CSV, it had, you know these four columns, but now if you see, the CSV structure has lots of other columns as well.

Right. So the first thing is we need to go and ensure that This read from CSV reads all the fields. So how do we do that? How do we go and refresh the fields? Now if you see this read from CSV source component, it is using the CSV connection file Connection Manager to read the file, right? So if you see down below in your tab, you have this CSV connection, right?

And the CSV connection is the person who actually goes and connects to that physical file. So let me go here to the CSV connection. Now, if you see this CSV Connection Manager, you can see that there is no file name here. So can you guess you know why we don't have a file name here? Yes, in case you are able to get it right. If you remember when we were doing a for loop, this value, this file name value, this full file path here is set by a variable.

In other words, it has become dynamic. So that's why he has removed the filename. But now what has happened is you know we have a new structure right. So, let us go and point again back to that file so that we can refresh with the new structure. So, let us click on this browser and let us go back and let us again take this customer new now we can see, even though I have taken this customer one dot txt, right, which has more than four columns, he's still referring to that old structure right. So, in other in, in order to refresh this with the new structure, you can see that there is a button here called us reset columns.

So, let us click on this button and there it is. You can see now, he has taken up the new structure, right, and everything good and I will just say okay, Now, you can see that there is a ello sign here. This arrow sign indicates that the Connection Manager, something has changed in the Connection Manager. In other words, at this moment, if you see, the Connection Manager of the CSV is having new fees, and this SSIS component is out of sync with that Connection Manager. So first thing, what we need to do is we need to get rid of this ello sign. And we need to make this SSIS component in sync with the Connection Manager current state right.

So for that, what we need to do is we need to right click on this and we will say Edit and we will just go and see the column so you can see now all the new columns are seen. And I will just press OK. And if you see now, the ello sign has gone off. So that indicates that this SSIS component is now in sync. With the file Connection Manager now, I can see that there is a error sign here. So, let us see what this error sign is you can see that in this destination component of the database there is an error sign and if I go and read this error sign it says that the table name you entered may not exist okay. So, why is that you can see that at this moment the table name he is referring is the old table name.

So, this table name what he's referring here is TBL customer and actually we have renamed that table to fact customer. So, if you remember this central table, we have remained we have renamed it to fact and the surrounding table we have renamed it to dimension. So, let us go and refer the fact table so, I'm going to go and click on this drop down and say fact customer and let us also ensure that the mappings are proper So, I can see A call to customer code customer name to customer name. Let us talk about this ID mapping snoot ROM and let me do okay and let me see if this so you can see now, we don't have errors at this moment. So, let me just go and build this and let us try to run this project. So that our application is now running.

So you can see that it is getting executed. And at the back end, I would expect him to load the records. So there it is, so you can see the records are loading. And you can see that all the records have been loaded from both the files into the database. But now here we have a problem you can see that it has uploaded customer code, customer name, customer amount and sales date, but look at the foreign Key references everything is null. And that was expected because currently the mapping is only done for these four fields it is not done for the other four fields right.

So in other words, you know, we also need to go and populate this other fold for fields as well. Now, loading these four fields is not so straightforward. What we really need to do is we need to do a lookup. In other words, what we need to do now is if you look at this moment, the text file the text file from where we are reading the records, if you look at the way the data is coming at this moment, the data is coming with a name so you can see here in the text file, we are actually getting a name saying India Maharashtra, Nepal Katmandu you know so we are actually getting the name the text right? So what we need to do now is you know, when this file is getting loaded, we need to go and do a lookup. In other words, for example, you can see there is something called us India.

So we need to go and do a lookup. So we will go here to the table. And we will actually do a lookup, right? So what we'll do is we will say, okay, that is India, we will go and do a lookup. So what is the ID one, and this ID one will get inserted over here, in the same way for states in the same way for other data as well. Right.

Now, in order to do a look up, you know, we have a very, very nice component here in the data flow tab. So if you go to the toolbox SSIS Toolbox, we have a nice component here called as the lookup. So now once the data comes outside the conditional split, and before entering into the customer database, we need to do the lookup, right? So I'm going to go and delete this part over here. And I will put a look up over here. So let's first start with lookup of a country.

Okay, so I'll say Luka. country. So the data comes out from here, and it goes to this lookup, right? And so let us do a right click, let us say Edit. Now you can see that there is a general section here, which has lots of things like full cash, partial cash, no cash and a lot of things over here like this drop down, right? We will talk about this complete screen later on.

It needs a dedicated video to talk about these properties, okay. But at this moment, let us go and Create a connection. Okay. So what do you want? What? What do we want to achieve here at this moment, that CSV txt file has the text like it has the country name India, Nepal.

So that text will be looked up into the database into the dimension table, it will be looked up from from the lookup, what we expect is we expect the ID. So Id will come out and this ID will get inserted into the fact table. So in short, we want to look up the CSV text data into the table. And from table this lookup should pull out the ID right. So here we need to go and define now the database where we want to do the lookup so the database is this. So at this moment, I am doing lookup of the country master.

Right. So I will say dem country so do a look up on this. And the next thing what I need to say here is that I need to say that. So what do you want to look up so you can see here, at the left hand side, we have the text and this text has to be looked up into the table. So this text will be looked up into the table. And from table, I want to get this country ID as output.

So the expected output of the lookup is the country ID. So I'm going to go and drag and drop this customer name to this country name saying that do the lookup using this text on this table right. Now the timer drag and drop this you can see that we are getting a very nice error saying that there is a data type mismatch between the customer name what what you get from the text file and the data type of the database. So can you guess what it can be? So any guesses in case you are able to get It okay in case not let me answer it. If you see at this moment in our database, let us go and look at the data type of country name.

So if you go and see the data type of country name, it is n where care. N stands for Unicode, right. And when data is read from a text file, he reads it as a non Unicode. So, from the text it is now non Unicode and on the database, it is Unicode. So, he is not able to do this join over here. So, let us go back again to a project and let us go and do the conversion into Unicode for the other fields as well.

So, I'm going to go and pick up the country name, pick up the state's name, pick up the product name, pick up the salesperson name because all these data have to be looked up into the dimension table right? You can see that all of these properties are DT underscore str at this moment. So DT underscore str means it is non Unicode, right. So what I'm going to do is I'm going to say, okay, go and create a, a Unicode country name. Our Unicode states name. You Unicode product name, a Unicode salesperson.

So in other words, create a new field new sets of fields and make them as DT Ws Dr. DT w str stands for Unicode. So you can see here I have created, I'm creating extra fields here on the fly, which will give me Unicode. Right? And I will say, Okay, so now let us go back to the lookup and let us start configuring it. So we'll see. Okay, I want to go on Look up from country.

And this time, you know, when I do the lookup join, I'm going to go and use the Unicode right, I will say this is your country name, and get the country ID from this, right. Now the way we have done lookup country in the same way, we have to do lookup states, we have to do lookup salesperson, we have to do lookups product, right? So in other words, we are going to have multiple lookups. So this output of the lookup country will now go to lookup state. So let me go and again, create one more lookup states logic here. So the output of lookup country will now go to lookup states.

Now the time I drag and drop the output, he's telling me that what do you want to send here? Because lookup has now two outputs. One is the matched output, and the other one is the unmatched output. So he's asking me that, do you want to send the match output? Or do you want to send the unmatched output Definitely, we want to send the match output at this moment. So I have selected it.

And I will say, OK, and here again, the process is same. So I'll see your edit, I will go to connection. Remember, this is states. So I need to select the states here. And I need to use the Unicode thing. So I'll say your state's name I've been sent here and I will say states ID.

And I will say, okay, right. In the same way I have to do now for Look, a person look up salesperson look up products, etc, right? So the again, the process is absolutely same. So let me go ahead and complete that. So you can see now, the way I had created country lookup and states lookup, in the same way I have created the products and the salesperson. So the final lookup of you know, after all the lookups have executed is the lookup salesperson.

So let me take the output of this lookup and give it here. The final destination. And definitely I'm just interested in the mash output. So once this is done, we need to now go and map the IDs. So in other words, you can see that the lookup is giving out all the IDs. So let us go and start mapping all the IDS with respect to IDs in the table.

So I'm going to go and map all the respect to IDs I will say okay, and let us go and run this. So that it is running and it does go back and see how the data is loading. So now, in this you know, it should load with all the IDs. So you can see that it is right so you can see now, all our data is loading with the respective foreign key IDs from the master table. So in other words, now the complete process is happening first all the master data getting loaded. And after that finally, the fact table is getting loaded by using the lookup.

Now, I would like to discuss about a very typical behavior which is exhibited by the lookup. If you see look up whenever you do a lookup, you know there are two kinds of record one is you will record records which have a match and the other one is you know, you can find records which are not matched. Now, by default, the lookup fails, it stops executing that itself if it does not find a record means what for example, now, let us say, let us go back to our database. Let us go back to one of the dimension table for example, you can see sales person. Now, let us say I go and I go and add a salesperson in my text. For example, let us say Raja, okay, now, this is Roger salesperson is not is not that in my table at this moment so you can see I have shaved I have Roger have kodaka have Sham but I don't have this salesperson here Raja.

So, what will happen is the salesperson lookup will fail. In other words, if I go and let me go back to my table first and let me first go and delete all this let me go and delete this and then because at this moment I have not done a CDC on this fact table right. So it will throw me exceptions. So, let me go and execute this. So now if I go and run this, so let me go and run this. So if you see now, so that my application is running, so it is running.

So that it is loading all the master tables country and whatever it is. But when it goes to load the fact table, you will see that there is an error there in the Salesforce. Look up, you can see this red sign here, which is clearly on the salesperson. And if you go to the progress tab, and if you go to the first red sign, you can see that over here it is saying, row ill dead no match during lookup. So in other words, by default, the lookup fails it stops the program execution right there itself when it does not find a data, but this is not always the expected behavior. Most of the time, you know, what we would expect is we would expect that okay, if the records are not found, you still continue ahead with the program right?

So for that, what we need to do here is so let me go back again to my data flow, let me stop debugging. Remember that when you are debugging, you cannot go and make any changes here. So ensure that whenever you are making changes, you stop debugging, and to stop debugging, you have to click on that red sign which I just clicked. So at this moment, we have a salesperson entry here. Which is not existing right. So, let me go to the lookup of the salesperson let us right click let us say Edit and you will find that there is this drop down here you know it says that specify how to handle rows when you do not have a matching entry.

So, you can see by default it is now having this fail component. So, what I can do is I have a lot of options here one of them is ignore failure ignore failure means even if there is a failure you would like to still continue right and there is one more option here saying redirect rose to the error output. So, in case if the row is not found, you would like to redirect it to the error output error output means that red arrow So, this in case the rows are not found, it will actually come here, okay. And there is one more option here saying redirect to the no match output. Okay. So no match output means You can see now this lookup has one more arrow here, this arrow is nothing but the no match output.

Remember lookup has two outputs, one is the match output, and one is the no match output. So what we'll do for now is we will say that okay, I want to actually ignore the failure. So I want my program to continue. So let me go and run this application. So my expectation now is that I shouldn't get any errors and he should try to load all the records. So that it is executing loading products and executing my fact table and I can see that he is everything is looking green here.

Let us go back here. And let us check if the data has been loaded. And you can see that all the data has been loaded only for this ninth record here in the salesperson he has inserted null. Why null because this record code was not found, right. So you can see now, how the behavior of lookup has changed. Now, before I end this video, I would like to talk about a small improvement tip here, defining the data type at the source.

If you see at this moment, our source and destination, they have lot of data type mismatches. So basically, on the source, we have a text, a string, that string has to be converted to a currency. On the source, we had some fields, which were non Unicode, and that had to be converted to a Unicode. So in order to achieve that, you know, we had this data conversion component which is doing the conversion. Now, wouldn't it be great if this conversion is done right at the time when the data is read that means over here itself, when you're reading from the CSV if we can do the conversion, that To improve improve this improve the performance of this package to a very great extent. Now, this read from CSV component reads data wire this Connection Manager CSV connection.

So, let us go and open the CSV connection. So, let us go to this connection managers tab here and let us open this CSV Connection Manager. Now, we can see here that there is no file name here if you remember this file names are coming dynamically, but at this moment we will need it so, we are going to go and give the file name here So, that we can just go and set some properties okay. So, it is in bulk file new. So, let me open this customer one. Now in this you know in the advanced tab if you go you can see that there is a field name at the left hand side and at the right hand side you have the data type So, if you wish you can set the Data Type right here itself.

So what I'm going to do is I'm going to go and set this customer amount to currency. So, let us do that remember it is a currency right the country name to DT w str. So, let us set this to DT w str this to also DT w str this also to DT w str right. So, let us go and set all of these data type right in a when the file is read, okay. And I'll say okay alright, and you can see this ello sign here which indicates that oh something changed in the Connection Manager but it is not updated here. So just do edit and say Yes, go ahead and do okay.

So with that what will happen is you know that our changes have been applied now to the component as well. So now that the data type conversion is happening at the source itself, So we don't need this component anymore over here. So we don't need this conversion component, right. So I'm going to go and give this output here to the sorry, to the conditional split, I'm sorry. So I'm going to go and give output to the conditional split. You can see that lots of red signs here, but don't worry, we have to fix it, we'll fix it.

At this moment, this conversion error I'm going to go and for now, I will go and delete this, we'll see that how to go and fix the conversion error problems, right. So you can see a lot of definitely read science here because all those fields have the Unicode and the currency has gone off right. So let us start fixing each one of them. So first thing is let us start from the conditional split. I did remember, you know, when you make such kind of big changes, you know, it's going to go and affect all the project down below. So be very careful when you're making such kind of changes.

In your project especially in production right, but fine at this moment we want to improve performance so we needed so you can see here this customer amount see why is no more there right? We can directly use the customer amount now so I'm going to go and say customer amount and customer amount because the customer amount at this moment is a currency okay? So I'll say okay, so you can see the conditional split error has gone off right so in the same way let us go and fix all of the errors what we see here so I'll say Edit. Let me go back here. So in this case, also this column has gone off so I'm going to go and delete this. So basically, we need to we can directly use now country name here because country name itself is a Unicode right?

So I will go into this and say okay, and you can see that okay, fixed for the states. You can see the states that are That's gone off in the states do that select country I'm sorry. So let's go back to columns here or in the states I'm referring country that is bad states right okay in the same way let me go and fix the other errors here as well. So, in short what we are doing is you know, we are using the fields directly now rather than so country name right, okay. products. So, let me go and fix all of these errors column.

So, product we have to use the product name for name right and that is the CMV sales person right columns See this person name? So, you can see slowly slowly all the red signs are going off. Okay. So, you can see that I have fixed up all the errors around the lookup, but you can see that on the error it's saying that there are some mapping errors. So, let us go and do an edit and let us see that what is the error mapping I can see that this new customer name has been removed right and the customer amount ci also has been see while so has been removed right? Because now we are doing the conversion directly on the main field.

So these mappings are currently wrong. So I'm going to go and delete them and you can see that that error has gone off right. So nice. And also in terms of problem in conversion, if you remember right, you know previously whenever there was a conversion issue, it was sending to the error of destination right. So now what will happen is in case there is a conversion problem, it happened right here on this component that is read from CSV. So, what will do is let us go to the toolbox and give the error output over here.

So let us go and say flat file destination. And any kind of conversion errors will be sent over here now, so I'll say yes, when the component fields send all these. So, to this all the error conversion means conversion errors will actually be handled here. Great, I can see that there is a small sign here saying that no rows will be sent to the error output because you have not configured it right. So remember, in one of the videos, we said that you need to configure the error output. So at this moment, I want to say that it is not.

You shouldn't be failing the component. Whenever the error happens, what I want is I want it to redirect the role. Okay. So To go and say read Eric row data crow right and i will see okay nice. So and also I need to go and give a file name here right. So at this moment just a new file.

So, let us create a error conversion file I think the previous file somehow got deleted by me right. So, let me say this is our conversion file. So, there was a previous file which which we had created, but somehow it got deleted fine. So, I will see edit file, okay. At a conversion This is actually edit conversion file, right. And, yes, I need all of these columns and let's see, okay.

Themselves okay and you can see that all the red arrow signs have gone off very nice. So, now it is time to go and test this application because we have done a lot of lot of changes, but the changes were all for the good because what was happening was there was an additional process of error conversion or sorry data type conversion, which was happening separately. So, now what will happen is you know, this is more efficient while reading itself, he will decide the data type which is more appropriate which is more efficient right. Before I go and execute this program, I want to make a small statement here. Any SSIS logic, you know given to you can be written by multiple ways. For example, you saw at this moment in the in the previous projects, you know, I was doing arrow conversion, I was doing the conversion of data of data type by using the conversion component And if you see now, I'm doing conversion at the source itself.

So a single SSIS logic or any SSIS logic can be written in multiple ways. One only what you need to figure out is that what is the most efficient way? So a difference between a good SSIS developer and a Okay, okay SSIS developer is that a good SSIS developer knows how to go and write an efficient SSIS logic. Remember that SSIS programs or msba programs run on large amount of data on terabytes of data they will run. So it is very, very important that it's not only doing the work but doing the work efficiently. So that was just a small suggestion because as you start moving up in larger projects in msbi, definitely performance is one of the key factors.

So let us go and run this now Till now I was running this exercise program by clicking on the start here, right which is actually a debug mode, a debug mode means you know you are running it from the start here, because you want to go and debug something you want to go and see some variables. But if you do a Ctrl F phi, again, I'm saying if you do a Ctrl f5, it will run in a release mode. A release mode is nothing but you know, when you go to production servers, right at the time, you are not going to run your program through IE you're going to actually deploy the program you're going to deploy the binaries only right. So if you wish to go and if you wish to run this program in a in a full executable mode or in a production mode mode, production mode away, then you need to go and press Control f5 so if you want to go and run in the designer or in a debug mode, you will click on the Start Here.

But in case you want to go and run this in a full Build Mode, then you do a control f5 so if you do a Ctrl f5. What happens is he will not run inside the Visual Studio. He will actually pop up a small A dos kind of a window, right and inside the DOS window you can see that he will start executing, right. So this running is more of a production mode running. Okay? You can see that because you are not running inside the UI, he is actually throwing up all the details right away on the command window to to help you figure out in case there are problems, right.

So the same thing which I was seeing inside the designer is now seen in the console application, because I'm running this application in a full compiled mode or in a production mode, right. So there it is, lots of things is emitting out. Also, let me go back to my SQL and let us see that the data is getting loaded. So we can see that data has got loaded and the program has finished. And in case when you're running in debug mode, you want to see that what what things have happened. You can go and read this diagnostics, you know, what has been thrown to the console application.

But you can see that the application is running and it is, it is working perfectly nice. Also, what I'm going to do henceforth is whatever code you are downloading right? In that code, you will find a SQL script. So this Complete database structure, I'm going to go and script it. So let me go and create a script here, generate scripts. And let me go and generate script the entire database.

So we'll go and script the entire database and next and this script you will find in the projects folder, so whichever is the projects folder of that lab, you will find the script over there. So let me so this is customer database warehouse. Okay. Save Next, Next, so that it is generating all the scripts and it has saved the file successfully. So at this moment it is present on my desktop, I would suggest everyone to go and watch the SQL Server videos because when you are doing SSIS a lot of SQL Server things do come in. So here he must have pasted somewhere the customer data warehouse there it is.

So I'm going to go and copy this. So whatever is your project at this moment, right inside this, I'm going to go and paste this thing Okay, based it so you can see in the miscellaneous you should find this customer dw SQL, okay. So whatever is the recent SQL you will actually find in this file, okay, so Henceforth, whenever I make any database change for that lab I'm going to go and update this customer dw SQL, which will be inside the same project. So that brings us to the end of 30 minutes of this lab. And this 30 minutes, we did the following things. First thing is, we talked about the lookup component, we saw that how we can use the lookup component to do a lookup.

We also saw that how we can configure the lookup component when we do not get a matched output. And we also saw that how to go and avoid the conversion component, the data type conversion component and do the conversion right and the source itself. So when you are reading from the CSV file that itself, if you want to go and do a conversion, we saw that how we can use the Connection Manager and use the advanced tab to do the conversion thing at the source itself. And one of the things you know this video was trying to communicate to you is that when changes happens when new fields are added. You need to do a lot of changes in the project. You saw that when those four new features The foreign keys got added, I have to literally change all the components and all the Connection Manager.

So one of the things is very important is when your application is running on production, adding new fields means you need to change a lot of things in your DTS x file. Now the next video is a 30 minutes video, we will talk about two concepts merge and merge join both these concepts you know actually are on the similar lines of lookup, you know, but they have a big difference. So the next video is on merge and merge join. And with that video, we will be competing five hours of training and after that we are going to have more 27 hours still pending. And my plan is that in the remaining 27 hours to also concentrate on SSRS and SSRS as well. If you see in this five hours, we Just doing SSIS.

And when we look at msbi, it is having three sections, right SSIS, SSIS, and SSRS. So what we'll do is we want to become a complete msbi professional and not just the SSIS professional, right? So, in the coming videos, I'm going to go and stretch myself to SSIS and SSRS as well. So I hope that you are liking the series. And if you're really liking the series, my request would be to please go to facebook.com slash questpond. And just tell us that on which lab you are currently because with that, you know, we come to know that how many people are watching our labs, and they are at which lab at this moment.

And with that we understand that how much interested people are to watch our videos. So please do give us a feedback because your feedbacks helps us to channelize how to move ahead with this series. Thank you so much for the next video. On much and more join. Thank you Now

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.