Lab 23 :- Script component(SSIS)

MSBI Step by Step Training Lab 23 :- Script component(SSIS)
43 minutes
Share the link to this page
Copied
  Completed
In this video we will talk about Script component in Data Flow Tab.

  Download

Transcript

Hello everyone. In this video we will talk about script component in data flow tab. In one of our previous video, we spoke about script task in control flow tab. And we learned that script task is the task which will let us do something which is not possible using any of the other control flow tasks. In the same way, in data flow tab, we have something called as script component, it will let us do something which is not possible using any of the other source transformation or destination tasks. So, as I said before, script component belongs to data flow tab.

So, the very first step will be adding data flow task from toolbox to exercise designer as you know, data flow task will be used to perform ETL operation and for ETL operation, we will be required at least one source and one destination. For our practical, we have a sample database created as a test for destination purpose with the name as script component DB with a table called TBL customer. This table contains three columns, customer name, city name, and state name, all of which are of type where care. Now from source point of view, we have a text file called customer JSON. So far, we have completed many practicals covering text files as source and destination. But this practical is going to be a little bit different.

Because the way in which customer dot customer JSON files store data is different. Usually, we are used to see records inside text by in comma separated way commonly known as CSV format. But this time the values are stored in a little bit differently. This format is called called as JSON format. It is a very common When data comes from rest based services, you are not required to worry about rest based services and all you have to you have to know only one thing that is, in many situations, data will be available to you in the form of JSON. And in this practical in this lab, we will learn how to read this JSON data and dump it into the destination.

Now, unfortunately, in data flow SSIS Toolbox, we don't have any tasks available, which has the ability to read these kinds of JSON data. And hence, we will end up at using script component just like script tasks in control flow script component in data flow will be capable of doing something which is not possible using any of the other task. So let's get into the SSIS or Visual Studio and let's start with the practical so first thing is definitely double click the data flow task. Now, drag Script component to data flow designer and select source because we are expecting script component to behave like source then say okay then simply double click script component and go to input outputs. Now remember guys, script component is not a source components like flat file source or oil DB source where output columns are already defined. Here we have to define output columns manually because we are doing everything from our own.

So, first thing we should do is name the buffer. So, this is the name of the buffer which is going to contain the data. So, we will name this buffer as buffer is simply a memory location you can say which is going to hold the data temporarily. I will name this buffer as customer data. Don't worry, you will soon understand where we will use this customer data. You just relax and go forward with the practical for now.

Then, click On this output columns and we are seeing that our buffer is going to contain three columns, one column, named as customer name, and data type of that column will be string. second column is going to be city name. And data type of that column is again going to be string. And finally, third column state name, which is again going to be string. So we created three simple columns with data type as string. So what does it mean?

It means that script component, our script component, which is acting like source is going to pass a buffer, or data to the transformation, which is going to contain three columns customer name, city name, and state name. Now let's go to script section again, and simply click on Edit script. It will launch a new visual studio where we will go and write our code. And this is the method create new output rows, where we are actually going to write the code. So, at the first step, we will go and create a new class in the project for that simply right click the project, say add new class and name this class as customer say add. Ship number two, make this class public because if you don't make it public, it won't be available outside the class.

So after that, we will simply create three properties, prop string, customer name. Now, when I created the new class, I named it as customer but you can name it whatever you want, but from a best point of view, best practice point of view to make it meaningful, we made it we named it as customer. Now when we are naming the properties, it is very, very important because we have to name it exactly. Way, the property is the name in customer JSON file its customer name, so we have to name it as customer name. Soon you will understand why we are doing it. Next we will create one more property, string and the name is going to be city name.

Enter. One more property, public string, and we will name it as state name. As you can see, the name of all three properties match with the names of or the names of the keys in the text file, customer name, city name and state name. Next step, right click this references folder and say add reference. Simply click on Browse and say browse again and simply go to C drive program it is x and here you will find a folder called as referenced assemblies reference assemblies, in that open this Microsoft folder framework folder, go to version 3.5. And here you will find a DLL called as system dot web dot extensions dot DLL.

Here it is. Okay simply say add. Now further information DLL is a very simple word it is something which contains some reusable functionalities. Okay, and for reading JSON reading JSON file, we will require some existing classes in dotnet library and we can get that using this system dot web dot extension dot DLL. So you are not required to worry much about it I will show it to again, simply say, right click this folder and say add reference and then click on this browse and browse again. First you have to select Browse in this left section, and then you have to click this browse on lower section.

Then simply go to C drive program 86 and go to reference assemblies reference assembly folder, go to Microsoft holder, framework folder, version 3.5 folder. And here you will get this system dot web dot extensions dot DLL. We're simply say add and say, OK, now you will see that reference our system dot web dot extension is in your project. NET next step, we will change this namespace. It's very dangerous right now. So let's make it something meaningful is the SSIS.

Json project something like this. Now, let's copy this namespace. And let's go to main.cs file and in the top, we will put two using statements. One is using SSIS JSON project. And second one is using system dot web dot script dot serialization. So step number one was creating this customer class.

Step number two was adding reference. Step number three was named renaming this namespace Step number four is adding these two using statements. Now in the next step, we will read the contents of the txt file and put it in the readable format. And for that we will require two more namespaces. So let's press enter here and let's put using system.io, semi colon, and one more namespace that is system dot collections dot generic. I'll soon soon explain why these two namespaces are required.

Now let's come down. And in this call new output rose. let's declare a simple variable called string file content. file. contain. And here we will say file dot read all text.

Now if you see this file class, it's coming from system.io namespace and that's why we are added to this using system.io namespace. If you remove it, you will see that it says c file does not exist in the current context. So, in order to work it properly, we have to put the system.io in the top Next we will say put a path here part of the file, which is nothing but this one simply paste it here and replace single slash with double flash, because that is how path will be returned. And we will say slash the name of the file name of the file followed by extension. So, now, this file contains contain the actual contents of the file. Next simply create the instance of two classes one is JavaScript serializer equal to new JavaScript serializer and second one is list of customer cusd Waimea customer.

Now, if you check, then you will find that customer class is coming from SSIS JSON project namespace list Class is coming from system dot collections dot generic namespace and JavaScript serializer class is coming from system dot web dot script dot serialization namespace. So that's why we have these three namespaces in the top. Now, let's go and continue with our code. So we will say customer. And then we will use this serialization class instance, which will expose couple of methods like the serialize. This the serialized method, check the extra description it says, convert the specified JSON string to object of type T, we have JSON string with us that is file content.

This D serialized method, let us convert this file content JSON string to this list of customer. So we have to say less than greater than and in between this less than and greater than put this list of customer. Don't worry, worry about the syntax. Just practice a little bit and you will get used to with it. Then we have to pass this file content. And then now this customer sorry, let's make it customers.

Now this customer three presents the dotnet representation of the actual data. Next we will prepare the output buffer from our script component for that, we will first loop over this customers so for that we will say for each customer Cust in customers. Now, it's very clear that this for loop is going to execute three times because our source data remember our text file contain three customer records and each time discussed represent the current record and each property of discussed instance represent the one attribute of the data. So what we should do, we should say create the buffer and attach values to values of this customer to that buffer. So how to do that, simply say customer data buffer We remember that's why we named that buffer as customer data buffer whatever you name it over there. We named it as customer data over there so you simply get customer data buffer here, we'll say dot Add Row, dot Add Row and after that the next good news simply say a dot and you will get all three properties in intelligence Why?

Because inside customer data buffer, we created three columns with the name as customer name, city name and state name and hence life will become very easy over here. simply say customer name equal to Cust dot customer name semi colon, enter customer data buffer dot state name is equal to cursor dot state name semi colon enter customer data buffer city name equal to Cust dot city name, semi colon, that that's it. This is what we have to do here. simply close the Visual Studio. say okay and we are done with it. Configuring our script component as source.

So it's time for destination now, simply drag oledb destination to a data flow designer Connect script component to oledb destination, double click it, we don't have any connection manager. Now what we can do is we can simply click on new, and click on new again. Put the server name it's dot in my case, database is script component dB, say okay, okay, now data access mode, let's keep it as table name of the table. Let's keep it as TBL customer. Let's go to mapping. mapping is already there because names in the input column and names of the destination column are matching.

Simply say, OK, now let's go and execute the package. package is executing. It's done. Let's go to SQL Server. Simply refresh the table and you will see that records are here we successfully loaded data from our JSON file to the database. Now, we have successfully completed our first lab where we use script component as source.

Now the next task is we will use script component as transformation. And trust me in real time scenario that is the most use scenario, we will most of the time we will use script component as transformation. It's rarely we will use script component as source or destination most of the scenarios will be with transformation. After that source will comes to picture and finally destination will come into picture. For our next demo, we have one more table created inside the same database called TBL customer source. This table contain two columns customer name and address.

Let me repeat this table contain only two columns customer name and address. Let me show you the records inside this table. As you can see, address column contain Both city name and state name in combination separated by comma. Now, our task is load data from this table to this table. Now, there is a problem customer name is fine, but how will we distribute this address value between city name and state name and there is no transformation I will able to achieve this as well. So what we can do is in this situation, we can use script component and we can use it as a transformation tasks this time and achieve the expected result.

Delete all the records from this table. Now, let's go back to Visual Studio. Let's stop the execution. And let's rename these Dataflow tasks to script task. Script source task and Let's try one more data flow task and let's name it as script trance for mation. Now simply double click the task and drag oledb source and double click it and select table as TBL customer source.

Make sure that all the check boxes are checked, say OK, so we are done with configuring source. Now it's time for script component. So simply drag script component and select transformation this time, say OK, and connect source to transformation. Double click the script component and go to input columns. And basically here we will select all the columns coming from the input source which we want to be available in the next stage. Basically, we want both of these columns should be available, done.

Next, go to input output columns. And let's rename this input zero to customer input, which will be more meaningful. And let's rename this output zero to customer output just like we did last time customer buffer. It's the output buffer. So I'll say customer output. Now, inside this customer output we have to define columns, which are which is not right now in the source column and which we want to be available in the destination.

So there are two columns. One is city name, of type, string. And second one is state name of type string. Then, next, go to script section again, and click on Edit script. It will launch up a new visual studio with three methods pre execute, post, execute, and customer input underscore process input row. Now, remember That we are using script component as a transformation component that means multiple records or multiple rows are going to come to this transformation.

And this pre execute and post execute method will execute only once it will execute in the beginning it will execute in the end, whereas, this method will execute once for every row and this row parameter represents the current row it will encapsulate the current row. That means, in our case, this method will execute twice why, because our source contains two records customer four and customer five and this row parameter contains or represents the current row and what it means it means, this row parameter is ultimately going to how to properties inside rate that is customer name and address simple, but there is one important thing. This row parameter also encapsulate the properties, which we defined as output columns in our input output sections. So, it will also contain city name and state name property. So what we will do in this method, we will simply take the address property and we will extract city name and state name from it and update the city name and state name property.

So, first, let's take address data from our robot ammeter. Done. Now, we have to extract city name and state name from this address variable And to understand that, let's go to SQL Server. And as you can see, address data consists of city name and state name, which are concatenated with the help of comma that means, if somehow we are able to split this address by comma we will get city name and state name individually and that can be achieved with the help of split function of sorry split function or C sharp and if you see the description it says it will return string square bracket. So let's create string square bracket address parts and in shisha string square bracket means string array. So if you want to learn more about arrays, then you can follow or learn dotnet in 60 day series, but I will give you a little idea about it Don't worry.

Array means collection of elements of similar data types. Usually, every for every data you want to hold, you will go and create one variable. But what you can do is instead of that you can declare an array variable. And that single variable will get an ability to hold more than one data. So that's why array is nothing but it's a collection of elements of similar data types. And in our case, this address parts is nothing but it's a collection of elements of string data type, this single variable hold more than one data.

We simply split this address by comma and it will return two data, data one city name and data two state name that means this single address parts variable, hold more than one data and hence me See, this address parts is a array the top array of string data. So we simply say take individual data from it and for that we'll create individual variables string city name equal to address parts and get the first data and first data we will get by putting square brackets and zero semicolon. Now, we need second data that is state name equal to address parts, square bracket one semi colon done and now we will use this city name and state name variable and update this row parameter for that we will say row dot city name equal to this city name, semicolon enter rho dot state name equal to this state name, semi colon done.

So we are done with writing our transformation In logic, simply save the code. Close the Visual Studio, go back to Visual Studio where our SSIS project is open simply say OK, and we are done with configuring our script component. Now we need destination. So simply drag oledb destination from toolbox Connect, transformation to destination simply double click the destination and make sure that Connection Manager is set to script component dB. Keep data access mode as table select table as TBL customer go to mapping and you can see that our input contain four things customer name, address, Date Name, city name, two of which are coming because of source and two of which are coming because of transformation. mapping is already done because names of names of input column and names of the name of destination column is matching, simply say, OK, and let's go and execute the package.

But before we go and execute the package, remember guys, in control flow, we have two tasks, one task will take data from text file and dump it into the destination. And second task will take data from our source table and appended to the destination. That means if we simply go and execute the package, we will end up with five rows in our TBL customer table. But this time, we are interested in testing of only script transformation. So let's go and disable this control flow task for now. And simply go and execute the package.

Package execution done. Now let's go to SQL Server. Let's simply say refresh. And what we can see here we can see that both rows are here. So we are done with using script component as a transformation. component.

In the first lab, we used script component as source. In the second lab, we use script component as transformation. Now in third lab, we will use script component as destination. For lab three, let's talk about a scenario. Let's say we want to take data from this TBL customer and dump it into a data source or destination. But the condition is destination should be a JSON formatted text file.

So what to do in this case, because we don't have any tasks available in SSIS data flow toolbox, which will let us do this easily. So in that case, we will use script component as destination and we achieve it. So let's get back to our Visual Studio. And let's simply disable this control flow tasks. Let's drag data flow task, double click it and let's drag oledb source and double click it to configure it keep Connection Manager as script component dB, keep the taxes mode as table select table as TBL customer go to columns yes we are interested in all the columns say okay now drag script component from SSIS Toolbox to data flow designer and select destination as option this time because we want script component to be destination this time, simply say okay and then simply collect oledb data source to script component double click it and then first go to input columns and select what all columns Do you want to be available in the next stage.

We want everything now go to script Again, and simply click on Edit script. Again Visual Studio where we will go and write or custom logic. But this time this logic is going to be our destination logic. We will start with adding a reference of system dot web dot extensions because it will be required when working with JSON. But this time it is not required to browse through the core folder because we already added reference once and Visual Studio is intelligent enough to keep the history so in recent folder in recent recent option, you can see that that DLL is already there. So simply check that say Okay, next task is creating a new class called customer so let's do that right click add new class.

Let's name this class as customer just like before. Let's make this class public just like before, and let's create three more properties sorry, three properties inside this class just like before, first property is going to be customer name. second property is is going to be city name. And the final property will be just like before state name. Let's change the namespace like SSIS. Or let's make it script destination.

Next task, go to main.cs. And let's put the necessary using statements. First one will be using system.io because we are going to create a file so we need this namespace. Second one is using, sorry script destination because we want to refer that customer class. Third one is using system dot collections dot generic because we are going to create more than one more than one customer at a time. And finally using system dot web dot Script dot serialization semicolon.

So we have everything set up. So now it's time to write the actual logic. Now if you look at the structure, it's almost same as the one we had in case of script component transformation. We have one method called pre execute, one method called post execute. It will execute in the beginning it will execute in the ending and one method called as input zero underscore process input row, which will execute once for every row, and which are one parameter called row which will encapsulate the current row. Now let's go and write something here.

We will say, customer Cust is equal to new customer. Enter. We will say customer dot customer name is equal to rho dot customer name, semi colon Cust dots city name is equal to rho dot city name semi colon cost dot state name equal to rho dot state name semi colon. Now next, as we discussed before, we will go and generate a JSON format a text file of same, so let's do it. For that, we will need an instance of JavaScript serializer class. If you remember, in the first lap, when we were we spoke about script component as source we use this JavaScript serializer class and we use the D serialized method of it, which generated a list of customer from existing JSON formatted customer string.

Now, this time, we will use something called as serialized dot serialize method which will take a customer and generate a JSON formatted string out of it. So let's do that. I will say, just next step, go and write this data, this JSON string data in a file. So for that, we will use file dot write all text, which will need a path. So this is going to be the path. So let's simply copy it.

And let's paste it here. Let's replace single slash with double slash, and end up with new Jason dot txt, comma, the content that is JSON. This method will simply create a new file called New JSON dot txt and put this JSON contained inside this. So let's save this code. Let's close the Visual Studio. And let's go back to our SSIS Visual Studio and let's say okay, so we are done with configuring our script component destiny.

Session. So let's go and execute the package execution done. Now let's go to the folder. Here you will find new JSON dot txt file. Let's simply open the text file. And strangely, only one record is there.

And that is second record customer file. Now let's try to understand reason for this kind of output. Close the text file. Go to Visual Studio, stop the execution. Double click the script component, click on Edit script and then simply wait to use will to open the account no just encoding option. Just come down and put a breakpoint in input zero underscore process input row method.

Save the code, close the Visual Studio, say OK, and simply go and press f5 it will hit the breakpoint First, let's Go and check the row parameter. As you can see, city name, customer name and state name. It represents the current row. Now let's go back to Visual Studio, let's press F 10, f 10 f, then f 10, it created the new customer object. Now I'm creating the instance of JavaScript serializer, which will be required to create JSON formatted string of current customer object. Press f2 and again, and it will create a new JSON formatted string f 10.

Go to JSON. Click on this magnifying icon. You can see that a new customer JSON string is here. Customer for Kolkata and West Bengal. Say close this line we'll create a new file called as new JSON dot txt and write down this content. Simply press f5 breakpoint again got hit row.

It contained the second row Chennai customer five And Tamil Nadu before we go and press F 10. This time, let's go to the folder you can see new JSON file is here, simply open that and what we see it contain the first record customer for Kolkata and West Bengal, simply close the notepad. Go back to the source code, simply press f8 and again now the customer name is customer phi. city name is Chennai state name is Tamilnadu Let's press f 10 f 10 f 10 new customer object created, new JSON string got created and we are again creating a new text file simply press f5 what it will do, it will simply replace the prior text file with the new text file and hence the new text file contain the second record. And this is why we are always getting the last record inside the text file because for every row, that process input row method get executes and Everytime we are creating a new text file and hence we always get the latest or the final record in the text file.

Now let's see how we can solve this problem. Now to solve the problem first let's close the text file. Go to Visual Studio, stop the execution, double click script component, click on Edit script. And let's navigate to input zero underscore process input row function and understand the code once again. First we are creating instance of customer object based on current row then we are creating JSON string of it. And finally, we are writing that JSON string inside the fight.

But in the entity, we should create a text file which consists of JSON string of all the customers for that we will create a new object of type list of customer as a global variable customers equal to new list of customer and then we will create So then we will add disgust inside this customer cusd and then we will create JSON string of this customers instead of cast. Let's put the breakpoint, say sale, close the Visual Studio, say okay let's execute the package once again. breakpoint got it. Now before we go and continue our execution first let's go and check this customers like now as you can see it contains zero elements. Now we are creating a new customer instance based on current row. So let's press f 10 f then f 10 f 10 and then we are adding this customer instance inside our customers collection done.

And then we are creating JSON string of this customer's collection. Remember, right now or customer's collection contains one element with a value as with the value as you can see Kolkata customer four and West Bengal. Now let's press F 10. And let's check the JSON string First of all, as you can see, we have one element right now, but the complete string is a collection string. Let's say close. Now we are writing this newly created Jason string inside the file, we will press F 10.

Now let's go to folder. Check the text file. Yes, it contains one element, but we also have the square brackets. Now, let's go close the text file. Go back to our Visual Studio. breakpoint is here.

Let's press f5 breakpoint is again here. Now if you check the customers collection, it contains one element already with one element With name as Kolkata customer for and West Bengal. Now we are creating new customer instance based on this current Pro, let's press f 10 f then f 10 f 10 Let's check this customer Yes, Chennai customer fi and Tamilnadu is the second customer, we are adding second customer inside our customers collection and let's press F 10. Now we have customers collection with two elements because one and two Kolkata customer for West Bengal and Chennai customer fi and Tamil Nadu and we are creating JSON string of this customer again. Let's check the JSON string as you can see, it's an array but with two elements one and two. Close we will say f 10.

Again, it will create a new text file or we should say it will just replace the existing text file With this new JSON string contained, let's go to the folder. Let's take check this text file. As you can see, this is what we are expecting correct. Now let's go to our Visual Studio again. Say phi, we are done with the execution and we have what we are expecting. But there is a small problem right?

The problem is why we are replacing, we should write our logic in such a way that file will get created only once, but with all the contents. Now, let's see how we can achieve that. Close the text file. Go to Visual Studio, stop the execution and double click script component. Click on Edit script. Now simply navigate to input zero underscore process input row function and here we will simply remove the breakpoint first and then we will simply cut this code and we will put this code in this post execute method.

Remember, this method will execute in the end. So let's put a breakpoint here. And let's take this list of customers from here, and let's put it here in the top. It's same. Now let's put breakpoint here. Let's put breakpoint here, save the code, close the Visual Studio, say okay, execute the package and wait for a breakpoint to get hit.

So breakpoint Godhead. Here we have customer and here we have list of customers define. Let's check the customers first. It's empty correct. Now, we are trying to create a new instance of customer based on current row. Yes Done Done.

Then we are adding that customer inside the customer collection but we are not creating the file. This time, simply press f5 and you will see that the breakpoint got hit again now, this time we are creating the second instance of customer. Please note that right now customers contain only one element, but as soon as I press f 10 f 10 again, you will see that now customers contain two elements, simply press f5 and you will see that at the end this post execute got executed. Here we are creating a new text file, simply press f5 execution done let's go to our folder. Let's open the new JSON dot txt file and here is the content we are expecting. So, this is all about script component in data flow tab we we have learned how to use script component as source as destination and as transformation.

I know this practical is little bit a bit practical and a little bit complicated practical but you If you practice it two or three times you get used to with it, and it become very easy for you. At any moment if you found any difficulties or any questions, please feel free to drop a mail to correspond at the red correspond.com. Thank you very much

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.