Lab 9 :- Creating SSAS Cube. (SSAS)

MSBI Step by Step Training Lab 9 :- Creating SSAS Cube. (SSAS)
40 minutes
Share the link to this page
Copied
  Completed
In this lab we will be Creating SSAS Cube.

  Download

Transcript

So, let us start with lab nine and in lab nine, we are entering into ssps. So congratulations to you and as well as to me, because we are starting with the next leg of MSP if you look at the MSP a professional market, if you ever meet msbi person, you will find that he is either good in SSIS or he is either good in SSRS but you will find very less professionals in SSIS. And I think this is because of the way software projects are organized. If you see software projects, every software project needs reporting. So SSRS becomes kind of a compulsory component. Every software project in almost every software project has data coming in some kind of massaging happening.

So again SSIS becomes a need of the hour. But when it comes to to analysis, to forecasting, you will find very less application of these nature's And that's why SSIS is useless, but certain done, when you say you want to create a OLAP application, when you want to do heavy calculations, forecasting analysis, you cannot do without SSIS. So, when SSIS is needed it is needed very badly. So, if you remember in the first video, we said that SSIS is all about analysis, you know, the main goal of SSIS is analysis. Now, for a moment assume that, you know there is no SSIS and you are supposed to do analysis without it, so, how will you go about it? So, in other words at this moment, if you see all the data is logged in into this table, so, by using SSIS the whole data is in this table, you know which is in this database, which is having that star schema or snowflake design right?

And on this we want to do analysis. So, when we say analysis so analysis means We will do some kind of calculations right some kind of calculations like sum count or must be it can be a big formula right. So, for example, now let us say I want to just go and see the see the total sum of amount right. So I'll say okay select some of customer amount right. From So, actually you will go and write such kind of SQL right from fact customer and I didn't say execute okay. So, it gives you a sum right?

And then probably would like to do different kinds of analysis saying Okay, can you give me some as per the date Can you give me some as per country, can you give me some as per country and product right. So, the analysis you know, can we keep growing and not only will be just simple aggregate functions, it will be also some Big formula right? Now think for a moment you know, if we if we follow this approach, how does it work? So, you have the data here in this table on that you go and run the analysis. So, every time you find this SELECT statement or whatever right, it will go, it will calculate from the database and then you can show this on your report and these calculations will happen again and again and again. Again, I repeat, you know, you have the database, you know, wherein you have your data warehouse on that these formulas will run, they will calculate dynamically right, and show the data on your report.

Now, think for a moment that this database or this fact customer is having trillions of records, trillions every time if you are doing calculations on the fly, it would become extremely slow. If you for a moment, the calculation logic, whatever some count, whatever formulas you are running are actually dynamically calculated. If we can change this dynamically calculated to pre calculated, then that would really speed up the whole process. In other words, if we can do something like this, if we can have some kind of a process right at the background and do this process, we say, Okay, I want to run this formula, I want to do this, I want to do this. And what he does is he goes, you know, after every minute, or whatever you can think about or every time a record is added, he goes, he does all the calculation and stores those calculated analysis into a separate database, and then we query that database.

In other words, if we can move from this dynamically calculated Do a pre calculated architecture, then that would make things really fast. And that's what exactly ssss ssss is nothing but it's a pre calculated database, you can think about SSIS is a software, which helps you to define these three calculations. It runs at the background, and stores this tree calculation into a separate database. So if you, if you go to the Object Explorer here, right to go to Object Explorer, you have to click on View, and you have to go to the Object Explorer. And if you go and click on this collect, you can see that your RDBMS that means you know where your data is stored, that is your tables, your rows and columns. Those are actually over here into this database engine.

The analysis service, as you can see is a separate database altogether. So if you go and connect over here, you will see that it is a very separate database altogether. And if I go and zoom, let me quickly go And zoom here. So we'll start my zoom it over here. There it is, I've zoomed in, you can see that this slender sign over here indicates that this is our DBMS this cube sign here indicates that this is Analysis Services, right? And if you remember we talked about about the cube right in one of the previous lectures, right?

So you can see, Analysis Services is nothing but it's a pre calculated database, which stores your pre calculation. So, in other words, from this RDBMS it the process of analysis services would run and it would save it into this SSIS database. So from our DBMS You know, it will actually take the data, do all your calculations and store it inside this analysis services or inside this cube. Storing p calculations into SSRS database is a two step process. First is we need to go and create a project we need to go and define, you know, what is a calculated field, what kind of calculations we want. And the next step is to run that project at the background to go and fill this database.

So in order to create the project again, we need to go to our to our tool here, right? And over here, we need to go and add a SSIS project. So let me right click on the solution here and say add a new project and this time we will select SSIS. So that is slowly slowed us. Okay, so let me go to business intelligence. I'm so sorry.

Sometimes I keep wondering where it is, and to Analysis Services. And you can see over here in the analysis services Now there are two ways of creating a project SSIS project one as you can see there is something called as multi dimensional and the other one is something called as tabular okay. So, for now select multi dimension I will talk about the difference between tabular and multi dimension later on right. So over here I will say SSIS customers so this is actually a project you know where we are going to go and define our three calculations and create the cubes. You can see now, we have one solution, and in that we have two projects. So, one project is for SSIS and the other project is for SSIS.

Remember, in Visual Studio, the way the project structure is at the top you have a solution and then you have projects inside it. Now, if you expand this SSIS project over here, you can see that there are some items of folders here. Okay. Now, the way you should Configuring SOC s project is you can see that over here you know there are some folders like data sources data source views, cubes etc. Now, the way you will start filling is you will actually start from the top and you will go to the bottom to fill. So, in other words you will first start with data source then you will fill the data source view then you will fill the cubes and dimensions.

So, in other words you know the way in SSIS we go and we start configuring is by starting from the first icon over here of data sources and then we go down below. So, let us click on Data Sources first. So, what exactly is a data source? data source is nothing but you know here we need to go and define from where this SSIS project will be fetching data for pre calculation. So, at this moment, the SSIS project will fetch from this database right. So, we need to go and define we need to give this database Engine server name over there, right?

So let me go and copy. So I'll say database engine and copy this. Okay. And then we go here. So I'll right click on this data source, I will say new data source. So you can see that it starts a wizard here and say next.

And here I will say new. And I will give that server name which I've just copied Ctrl V, the database, you know, where our, our data warehouse is, at this moment, you know, where the SSIS project has loaded the data is this one. So I will select customer data warehouse and say OK, and say next. Now, when this cube when this SSIS project runs at the background, you know it, it needs access rights to go and modify the database it needs to probably, you know, create some things in your local hard drive, etc. Right. So he's saying that Can you give me some account here by which I can go and start doing this activity on a computer right?

So at this point, what I'll do is I will give you my administrator account here so, so that I do not have any problems you know, when I run the cube, so I'll say next finish. So, I've configured the first folder here data sources, in that I have have defined you know the database from where our cube will get the data. Now, the next thing what we need to define is the data source views Okay, what is what exactly is that? Now, what happens is, let us say you have a database for example, like the way we have, right, but in the database, you know, you can have lot of other tables as well, for example, at this moment, to create this cube or to create the analysis. We only need this 12345 tables here, right? But you can see there are lots of other tables also like the like The system tables now I don't want these tables to be included in my SSRS project right?

So this data source view is nothing but you know it gives you a choice saying that okay from 10 tables you know how many tables you want to select right so I'm going to go and create a view here and I will say that yes I want him country I want him product I want him salesperson I want this I want this but I don't want the CES diagrams. Okay, this is diagrams is a database. It's a table where it stores your system diagrams, what do you have drawn? Okay, your ER diagram of SQL Server, so I don't need that I need only this. Five tables at this moment I'll send next and also finish. You can see that we are filling from the top to the bottom.

So you can see that we have failed. The data source. We have failed the datasource view now it is time to go and fill the cube. And before we fill the cube, very quickly let us do a review of our data source view because I can I see that there are some problems here you can see if we remember, we had created the primary keys and foreign keys right. So, you can see the data source view is very smart, you know, depending on the primary keys and foreign keys, you know, what he has done is he has created the relationship and ssps as well. Now, you have the full authority here to go and delete these relationships here because it is very much possible that in our DBMS you need the relationship but in SSRS you need a different kind of relationship or probably you don't need it.

So here this data source view helps you to decouple your physical database structure architecture from what you want in SSIS Right, that is one and second we can see that because I did not define I think some of the tables primary key and foreign key it has not made the relationships here. So, what I can do is, I can go and set this as a primary key, I can go and set this as a primary key and I can go and define the relationship here. Now, you can see this relationship, you know, what is there should be the opposite side. So, but you can see that this relationship or I'm defining here is only getting defined inside SSIS if I if I go and save this, or is that, okay? If I go and save this, it is not going to go and affect my SQL Server data structure.

Right. So again, a very great point about data source view, you can have a different logical structure in your data source view and your physical structure can be of a different view altogether. So now the next thing, what we need to configure is the cube The cube is the most important part in SSIS it is the whole purpose why SSIS exist your final calculations, what you do your analysis, your forecasting your output of the formula whatever is a calculated value get stored in the cube. So, you can think about you can visualize cube as a structure, wherein in the center of the cube, your analysis your forecasting your numbers your facts, your measures, what you want to term will be stored and on the sides of the cube is the dimension the string in on which you want that measure to be evaluated right. So, if you visualize the cube it is nothing but at the center you have the number which will be stored, and on the sides of the cube are the dimension.

So, for example, at this moment, our cube will have the amount as the center right and the dimensions by country name it will be Product Name it will be salesperson name, states name and so on. So, let us go ahead and start configuring the cube. And one more important point to remember here, when you are doing a SSIS project, your database structure should be of the star schema or the snowflake design, if you're not doing it in that way, then your cube can have a very bad design right. So, it will be very difficult to create the cube because the way cube thinks is he thinks measures and dimensions. So if you are not designing your database in terms of snowflake and start schema design, then creating the cube would be very difficult okay. But at this moment, you know we have designed the database using the star schema design and I think you know, we should not have problem.

So let us go ahead and create the cube so we'll say right click New cube next. So it says that so do you want to create an empty cube or do you want to use the Existing tables to create the cube? Yes. So I want to actually use the existing tables. So you can see this result here says, So tell me, what is the fact table? Or what is that table where your number is stored?

Or what is that table in which you have your analysis data, the number of data, right? So my fact table is actually this one fact customer, so I'm going to go and select that. Remember, measure is a synonym for fact. Okay, so I've selected my fact table Next. Next, so he says that Okay, so if that is a fact table, then are the other tables dimension. I see.

Yes, all the other tables are dimension, but not this one. Okay. Let me quickly check. So we have selected fact customer, right, and we have selected all the dimension table. So we'll do a next and we'll do a finish now Let us see what has happened and what what things he has created. So, first thing if you let us start with our solution first so in the solution you can see he has filled the cubes here, right the main cube is here and all and automatically the dimensions are filled because in in the cube wizard we have specified the dimensions as well right.

So, in other words, if I want to go and configure the dimensions, I can double click on this and configure the dimensions over here. Or or if I want to go and configure my cube then I need to go and double click over here. So you can configure the complete cube or if you want you can configure the dimensions as well right okay, but let us go to the cube tab, okay. So let us double click on this cube and let us see that what kinds of things this cube has. Okay, so if you look at the cube, you can see that there are lots of tabs at the top. You can see I'm moving my mouse here so you can see that there is a cube structure that has dimension that is calculations and a lot of things but for now, don't get confused, relax yourself and just concentrate on two tabs at this moment, cube structure and browser.

The other tabs we will you know complete as we go in the video series ahead. So at this moment, concentrate only on two tabs cube structure and browser. Again, again I'm repeating if you want to configure the dimensions you will click on this click on the respective dimensions. If you want to configure the cube you will click on this you can see the views of both of them are different okay. So at this moment, we are configuring the cube. So, if you now see this cube structure here you can see that at the top we have measures so let us quickly go and check once or measures if they are proper or not.

Yes, that is customer amount. And he has also included one more field called as count. Okay, so you can see now when I go and process the cube right, this count will be pre k collected stored okay and down let me just go and check the dimensions yes these are proper the dimensions are here in all okay right. So, now that you know this structure has been defined here okay it is time to now go and run this cube or process this cube IBC okay. So, to process the cube you can see that there is this process button over here. So, let us click on this and it says that the server content appears to be out of date Yes it is.

I will say yes. And I will tell him to make it in sync. See, what does he mean by that server content is out of date server content out of date means that he says that on the SSIS side the on the on the on the RDBMS side there is some data and on the SSIS side it is not replicated. So it says that the contents are out of sync or out of they are not in sync Right, so that's fine. So that was just a warning. So I've said yes.

And you can see that he opens up this process cube tab here, we will go and see all these buttons later on, but for now, just go and say run okay let us go and build this cube or process this cube and let us see the output. So that you can see, if the cube has been processed, you will see this message here saying, process succeeded, I will close this. And the next thing, what I need to do is I need to go and see that how this cube is looking right. So, for that we have this browser tab here. So, let me go and click on the browser tab, so that I have clicked on the browser tab. So, you can see now, the measures and the dimensions are seen over here, right.

So, for example, if I want to go and see the total sales that has happened, I can just drag and drop this customer amount here. It is displayed. If I want to go in and go and see country wise then I can drag the country data here if I want to see product wise I can also drag the product over here. So, you can see now, how the cube is so flexible, right, the center is the amount your measure and on the dimensions or the on the on the sides of the cubes are the dimensions that we can see one thing which are noticed here, you can see that the dimension is just having the ID it is not having the string, right. So in other words, we need to go and configure the dimensions to include the country name and product name. So let us go and configure the dimensions.

Remember to configure the dimensions, we have to click on the dimension files. So I'm going to go and click on the dimension file. So you can see here, it says that okay, at the left hand side, you know, these are the attributes. And on the right hand side on your data source. It also has the salesperson name, so I need to go and include this over here. So I'm going to go and save this.

Yes. So in the same way, let me do it for the rest of the dimensions as well. So double click on this and include it here, save it. Right. And let me do it for all the other dimensions. Right country and pull out right.

Now let me go and, again, reprocess the whole cube. So again, to process the cube, you have to click on the cube structure, you have to click on this button here called as process. Yes, it is out of date. I want to make it in sync. And then that tab comes up and you're to click here and you can see there are some errors over here. And we need to see what those errors are.

So, if you read this error over here, let me let me do one thing, let me copy this and let me put it in a notepad. And let us read this error very carefully. It says that the country ID has a duplicate key. Right? So very nice. That means SSE s works.

Okay, if you remember, what we had done is in the database structure, and it is good that these errors are coming, because you should know how to see the errors, right? So if you go and see our database structure here, inside SSIS, if you remember, we had created a primary key on country ID. Right? And if you remember, this country ID and the product ID specifically, both of them were configured for a CD. If you remember in our SQL Server, the country Table did not have a primary key and why did not have a primary key because when we do a CD, we can get duplicate records, right? So that's why we did not kept the primary key over here in the country ID so if you remember, the point was that you know, we can have duplicate records here, for example, you can see that three error is coming from here, you know, three is for USA, which is an old record, and three us is the new record, right?

So the problem here is that we have made it as a primary key, but in the actual database, it is not a primary key because we have enabled a city. So now we want to have the benefit of both the worlds means definitely, you know, when it comes into SSIS, I don't want the duplicate country ID to come in. That's the first thing but at the same time, I want to ensure That, you know, this is a primary key as well, right? So what we should do in this case is we cannot take the database structure directly, right, we need to go and write the SQL and that SQL data, we need to connect as a dimension. So in other words, I'm going to go and delete this table from DSV. Delete this as well.

Right? So in short, what do what kind of data I want from here? I want from your definitely, so very quickly from the country table, I want the country ID Yes. So I want the country ID. Right. I also want the country name.

Okay, from the table dem country, but I don't want to expired records. So where is new is equal to one. So I only want this, right? So and if I take this, then I won't get a duplicate value here, right? And this is what we actually want. We don't want to give old stale data to SSIS.

So I'm going to go and copy this query. So whatever data is coming outside, this query should go into SSIS. Right? So I'll go back here. I will say add a new named query. Okay.

So I'll say I'll add a new named query. So I would say this is my country. And my SQL is this which I just did now, right? And I will say, Okay, I can see over here, that this thing what is coming out over here is actually query, this one is a physical table. So if you could just go and zoom here you can understand from the icons you can see here, this icon represents a query. And this icon represents actual physical table, right?

But yes, now I don't have any any harm I can make this as a primary key and here it is, right. In the same way, we also need to do for the products as well. But no product is bit different. If you remember, in product, we had used a CD, but we had used with the date and time, right. So in other words here, I need to take definitely the product ID I need to take the product name and all that. But I would like to see your where date expired is not null.

So whichever dates have expired, right. They are having none So it is not I'm sorry. So yeah, is not so in other words, you know, wherever the date expired is null, that is the current record, remember a city in case you don't remember, please go and see the city video. So I'm going to go and copy this. And I will go and again, add a new named query. And now this is dim product, okay.

Ctrl V, the same process what I have done with country set it as a primary key. So, the logical primary key actually, and keep the things right. Now again, I need to go and reconfigure my cube because I've changed a lot of things right. So what I'm gonna do is I'm going to go and delete this cube completely from here. Remember in production, it's a very bad idea to delete like this. I will explain you later on, you know, what is the proper way of configuring cube when you go to production, but at this moment, let me do the cruel way which is really not good.

I will not advise this While you're on production, but at this moment, you know, our goal here is to just create a first cube. So I'm not so serious in terms of you know how to apply the changes in a proper manner. So deleted the cube as well as I've deleted the dimensions. Again, I'm going to go and repeat the same steps what I've done previously, so this can probably just become a revision for you. Right next finish. And also over here, no, I need to go and get those names.

Yes. Straight name, country name, because we want to see the text as well as save it. And now I will go to my cube here and I will hit this process button. The dialog box will come up and this time we hope that we should not see any kind of errors. So, we should see everything green and we are not seeing things are green. But again, let us not run away from these errors.

So, let us copy the selection and it has put it here, let us read it. So, it says that product ID value one cannot be found. And that is right, you know, because what's happening here is, if you see in your customer table, you have a product ID one, right. But actually the product has expired. You can see the product has expired over here. So, because of that, you know, he's not able to make a relationship, right.

So in SSIS what happens is only these three records have selected 234. And in the fact all the records are selected, so he gets confusing that what is happening over here, right? So what we'll do is for now, let us not expire this record So I'm gonna just make it null again remember, null and nothing and zero are different things okay. So, you can see here, I did a null here by pressing Ctrl zero. So, you go to the column and you press Ctrl zero to put a null right. So, do not put space if you put a space it is not a null, do not put zero it is not another 1990 or 1800.

If you backdate it is not a null a null is nothing and null is a null. So I put a null here. Now let us go back and let us again reprocess let us see if we have errors, remember errors will come you need to copy it, you need to paste it, read it, understand it and fix it and there we are. Happy to see the success right green. Good. So that is nice.

Now let us go back. To see the data. Remember to see the data we have to click on the browser tab. So let's click on this now Yeah, so there it is. And so, let us first take customer amount country name awesome with product with salesperson name right, which states name you can see that is something unknown here this unknown is because in the fact customer there is one record which has null. So that's why they're unknown and it is good to see that right.

So, you can see now, how the cube is created centered around this number centered around this fact. Now one thing, which probably you would ask me is that Okay, so what is the pre calculation here? Where is the pre calculation happening at this moment? I know that at this moment I have not specified any formulas. I have not specified any aggregation, but still Pre calculations are happening what is the pre calculation? If you see, let me go and clear everything here.

So, let me clear this grid when you go and drag and drop this customer amount here, he does not calculate by some, it is told, probably, I don't know the interval data structure, but it is stored there as a value like 4464. So, it just goes and says okay some take this value displayed, it does not do any kind of calculation, the time you go and display product name like this, all these values are pre calculated and stored, the time you go and say country name, he has taken the combination of product name and country name and made an entry pre calculated in the SSIS database. So, in other words, he has taken that structure in a fact and dimensional format so that he can read every dimension and every combination of the dimension and store the P can calculated value. So, in other words he is not going to go on runtime and say okay for us how much is the sale already the value is calculated as for us and stored also the combinations has been done for example, he will not go and say okay now take pants and us and tell me no already this value is calculated, because he knows now, which is the measure table and which is the dimension table.

So, this is a very important thing, when you are talking about OLAP when you're talking about SSIS your database design should be in a star schema or a snowflake design, it is not in a star schema snowflake design, then you won't get such a beautiful pre calculation. Also, I would like to discuss a small loss it's a problem but it is a constraint which you probably would need to remember or need to understand when you're proposing SSIS solution because a lot of time you know, this thing can become a deciding factor if SSIS is a good solution for your problem. Not the problem is latency. If you see at this moment, the way ECS architecture is your data actually comes into this database here into the holy TB database right afterwards. So, this process runs here the cube structure process, the the queue process runs here, and he loads this data from this database to the SSIS database.

So, it is very much possible that both of these two databases can go out of sync for some time or there can be a latency. So, in other words, as soon as the record gets added here, it will not be available in the SSIS cube, you need to run this process to make it available. So, this can be one of the factors you know which which probably can decide if SSIS is a good solution or not. So, you need to make the customer aware that this data what is coming from the cube is not real time you know there is a latency which is there in between Now one of the things probably which could have come to your mind is because the same question I had asked to my teacher when I learned msbi that if SSIS is a database, then what is the language?

What is the query language we are using? If you look at our DBMS we use SQL. So in the same way does this accs also have some kind of a query language? And the answer is yes, it has. If you see at this moment, you know when we were retrieving the data inside this browser tab, internally, you know, it is actually creating a query you can see here. Now this query is not SQL.

This query is MDX it is a multi dimension query. Okay. So we'll talk about this MDX later on it needs a dedicated video. But I would like like to make you aware that MDX is the query language for SSE s cube. So in other words, now what you can do is you can go and publish this cube And any reporting tool which understands MDX can fetch there can get data from here and displayed on the UI. So that makes SSRS completely decoupled, you know from the reporting tool who's going to actually take the data from here.

So that brings us to the end of this video. And by this video, we have completed approximately six hours of a training and we have 26 hours still pending. Right and in this video, we saw what is the importance of SSIS that is basically pre calculation, we saw how to go and create a cube, we saw that how errors are seen seen in the cube and how to fix it, how to see it. And we also talked about some concepts like Latin see and MDX you know in this video. So I hope that you enjoyed this video. Again, it's a very humble request that please go to facebook.com slash questpond and let us know on which video you are What you are doing, what are the issues you are facing?

And are you really liking this video series or not? Because the more you put the feedback the more rigorously we record for that series right. Now the next video is again on SSIS at least let us try to do some more SSIS here and again then we will shift back to SSIS. So the next video is on time series. Okay, so the next video basically will talk about you know how to do reporting on on a date field. Okay.

Thank you so much.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.