Lab 36 :- Data Mining (Fundamentals and Time Series Algorithm).(SSAS)

MSBI Step by Step Training Lab 36 :- Data Mining (Fundamentals and Time Series Algorithm).(SSAS)
30 minutes
Share the link to this page
Copied
  Completed
In this video we will demonstrate the concept called as Data Mining.

Transcript

In this video, we will discuss and we will also demonstrate what exactly is this concept called as data mining and how to implement the same by using msbi. So, first thing let me congratulate you that you have completed a 35 Labs, which almost amounts to 20 plus hours of training and I understand that how stressful it would have been to reach here. But again, you know, let me tell you the hard fact that whatever you learn till now are just tools and components. I know that you know, how to now do ETL I know that you know, now 2030 components in SSIS. I know that you know, how to build a cube, you know, how to display reports. But you know, these are just tools.

This is not the real business intelligence. real business intelligence is about prediction. It is about predicting numbers. For example, you know, let us say you have a company sales, sales figure, you know, yes Wise. So, you have like saying okay, in 2012 it was 1000 in 2013 it was 2000. So, what will happen in 2013?

So, basically, the real business intelligence is about predicting numbers and giving a kind of a forecasting to the management, so, that they can make better decision making. So, now, let us understand in order to do prediction what what things we need the first important thing what we need to do prediction is lot of good genuine historical data. So, the first thing for prediction what we need is a lot of good genuine historical data. So, that is the first thing. The second thing what we need is algorithm algorithm, which will actually go and run over this historical data and give out the prediction output right. Now, this algorithm can vary as per our needs for example, let us say that you want to predict in a year why something Right then you will use some time series algorithm you want to go and do something else you will use neural algorithm.

So, algorithms can vary, but the first thing what we need is a lot of good genuine historical data genuine word is very important okay. And on that you will go and you will run this algorithm and this algorithm will give you the predictive data and then that predictive data you can display on SSRS report or whatever reporting tool you want. Again, one more important point I want to stress here, I am talking about algorithm I'm not talking about some count average, definitely, you know, to create algorithm you need all this, but some count average are just aggregations. So, you know, don't think about that you know, sum and count and and average and group by is actually algorithm algorithm means, there is some kind of a logic, you know, that logic actually goes and hunts information from this historical data. Now, again, let me elaborate more about these algorithms, these algorithms have to be chosen as per requirement means you know we do not have a general purpose algorithm you cannot have a general purpose algorithm, these algorithms have to be chosen as per the requirement for example, you can see that on the screen I have put two requirements before you in the first requirement the customer is saying that I am having certain sales figures for the previous year, you know what will happen in the next year.

So, for example, you can see, he's saying that okay in 2001, I had hundred, then in 2002, I had 200. So, what will happen now in 2003? Right, so, this is one requirement. Look at the second requirement. In the second requirement, the customer is saying that basically I have a male who's ages 40 and he's actually trying to buy shoes. So, can you tell me that what other items can interest him?

If you see both of these requirements, you know, here are the things What if What do you want to predict are very different. For example, in the first requirement I will go and apply something called as forecasting algorithm, because here I am trying to forecast for the second requirement, I will go in applies apply like associative algorithm associative algorithm means, when a person buys shoes and he is age 41 then from historical data will hunt that what other things people have bought previously. So, for the first one I will use forecasting algorithm for the second one I will use associative algorithm. So, in this way we have a lot of other algorithms as well. And these algorithms had to be chosen properly. Now, officially, this whole process is also termed as data mining.

You know, because what we are doing here we are going inside the historical data or algorithm is going like a James Bond, like a spy is going inside this historical data and he's trying to hunt information is trying to mine information and bring that information to the end user right in a intelligent way. So, this whole process is also taught That's data mining. So what we'll do is we'll start with small examples, simple examples. First, let us see that how msbi helps us to do business intelligence helps us to do data mining, right? So, what we'll do first thing is because we said that we want to do data mining, right? We need to have some data right?

So what I'm gonna do is I'm going to go to my database what we have our customer data warehouse, and let me create a simple table. We already have a lot of things over here, but let us let us not use the existing project at this moment, let us create a separate table and first learn the basics of data mining, and then we will see how to take it inside the project. So let me go and create a table here. This table is nothing but it is a simple sales table you know, wherein I have a year, right? And the total sales for that year so this table will have only two fields, sales here, and sales amount. So this table will Have a sales year.

I'll put this as integer for now. So this field will have the sales here. And it will have for that year, the sales amount. And so this is a money type Control S, I will just say TBL sales table. Okay. And then let's go and put some data inside this table here.

So I'm going to go and put some data here. So let me edit this table. And let us put some records inside this table. So that I'm entering some data, year wise. So basically, for every year, what is the sales done? Right?

2005 Whatever. Right? So I'm just putting some historical data so I have some historical data here, which I'm inputting. So now let us say we have historical data till 2000 Six and we want that what will happen in 2007 right? So, you can see that I put some historical data right. Now we would like to go and predict for the next year.

So for example, till 2006, I have the historical data, this is the way the sales have been done. So now, can we have some kind of prediction for 2007 2008 and 2009. Right. Right. So, let us go now again back to our visual studio that means to our SQL Server Data Tools. So, we already have three projects here at this moment, we have SSIS, we have SOC s and for SSRS let me go and create a fresh project for data mining because I feel that data mining needs a special attention.

Right. So, let's go and create a separate project for data mining. Actually, a data mining project is nothing but the SOC s project. Okay. But it is not compulsory that in data mining, you have to create a cube. Okay.

So what I'm going to do is you can actually go and do data mining Light here itself you can see there is something called his mining structure. So inside SSA as you can, you can definitely go and do data mining, but it is not compulsory that you need to have a cube. So what I will do is let me go and create a new project and I will name this project as data mining. Customer Data Mining. Okay, so I'm going to go and create Analysis Services project, I will name this as customer data mining. At this moment, we have created a SSRS project, but remember that there is no compulsion that your data has to be stored in a table or it has to be stored in a cube.

Definitely storing in a queue what happens you know your process will be faster because when you store data in a cube, it is pre calculated right? But there is no compulsion that you need to create a cube. But definitely in professional projects, you should first create a cube and on that cube, you should run the algorithm So, remember cube is different and mining algorithms are different. So, this folder here is different this folder here does not depend on cube, but having a cube is definitely professional and it has advantages. So, for now, let us go and first add a data source. So, remember our table TBL customer is stored in that customer data warehouse right.

So, let us give the server name and so, this is all standard what you have been doing in SSIS project right. administrator says all standard what you have been doing till did allow and let us also go and create a view. Now we only have one table at this moment remember the TBL sales table so, we are not going to go and pull up all the other tables because we don't intend to Create a cube right so I'm going to go and just select this TBL sales I will say next and then say finish. So, at this moment my data is stored into the RDBMS table TBL sales right? And on this you know, I would like to go and run my algorithm so I don't have anything complicated at this moment and our TBL sales has these records here right. So, let us go now to the mining structure right.

And I will say add a new mining structure next. So the first thing you say is that okay, so where is your historical data stored? Is your historical data stored in our DBMS or it is in a in a cube. So at this moment, you know, it's not in a cube it is in our DBMS. The next thing what he asked is asked is saying that so tell me which algorithm you want to run. So you can see there are lots of algorithms here which they have given.

These are all ready made algorithms and we are going to go through them one by one. At this moment, you know, we are going to go Select time series. If you see whenever I select algorithm for example, it says select sequence clustering down below there is a description and in the description it tells what that algorithm does for example, this is a sequence clustering algorithm. So, it says that it is a combination of sequence analysis, which identifies clusters of ordered events means, what for example, now, let us say you know a lot of customers are coming to your shop. So, let us say that is customer number one he first calls then he makes an inquiry then he comes to your shop and buys a product, there is a second type of customer he first sees the ad banner after that he comes to your shop, there is a third type of a customer who directly comes to your shop.

So, basically you would like to now know that okay, what kind of sequence have been used to make the sales the most right. So this algorithm is related to that. But at this moment, you know what our intention is to focus Cost right this forecasting here is using time right it is using time time means at this moment you know it is a year, but you know it is it is a sequence of time what we have right a sequence of year what we have. So, the algorithm what most suits here at this moment is time series, you can see this time series. Basically what it does is it analyzes time related data such as monthly sales data or yearly profits, right. And it is used to predict the values of future time steps.

So, definitely I want to go and run this algorithm that is Microsoft time series on this table here. So, I'm going to go and press next. So, this is the view Yes, next. So, you can see that the next thing what he says is that, so what are the input tables you're going to use for your analysis, okay. And you can see now there are two checkboxes here. One is called as a case and other is called as nested.

We'll talk about both of these two boxes later on for now, just go and check the case table I will talk about these terms cases in case a nested lead from which is related to data mining okay. So just select Next. Now for time series you at least need to input one input is your date and time field you know on which the period is running. So, this date and time field can be a month it can be a year it can be full mmddyy it can be in our minutes second, right whatever. So, at this moment, you know, our our time field is nothing but the year right. So, that time field or I can see the key field you need to go and define.

So I'm seeing here that this sales here is the key. It is a time field. The next thing is so what do you want to predict I want to predict the sales amount, right? So I'm going to go and click on this predict here you can see there is something called as predictable so I'm saying I need to predict this sales amount Now I need to predict this, as well as I want him to read the historical data. So this also becomes the input. Remember, our sales table has some date, right?

So I want that the algorithm should use all of these history and then do the prediction. So I'm saying that this sales amount field here is not only predictable, but also his input has to be taken. Okay. And you can see that there is a small suggest button here, which I will never tell you to use it. The suggest button if you click on it, it automatically tries to go and see the data types and tries to define the key and the input predict table automatically. But I would suggest never use this.

Use your own logic. You know, the business well, you know, the domain well, so don't use this button which is here. So I'll say next. Now, these fields are I will say this model structure what we are creating a deployment also has a data type. So you can see that there are basically Two data types at this moment, so, one is seen continuous. So, the sales amount, he said it is continuous continuous means hundred hundred and 101 point 101.2.

And key time means, you know, basically Saturday, Sunday, Monday Tuesday, no discrete value getting it. So, basically I have said yes, this is a key time and this is continuous. And I'll say next and as a finish. So, you can see that he is now showing us the mining structure here saying that you have two fields you know, and you can see that he says that this is the key time you can look at the icon here and this is the thing What do you want to predict? So, now, we have defined the algorithm we have defined you know, from where we want to go and run that algorithm. So, in other words, we also have defined the source of the data source right you know, where it has to go and run it.

So, let us go and process this algorithm. So you can see that there is a process button here. It says process the mining structure. Remember this Process process over here is not the process of the cube. You know, at this moment, I don't have any cube in my project. So don't confuse this processing with the cube processing.

This is the mining structure processing. Okay? So I'm going to go and say okay process this mining structure, what does this process mean? Actually, this process means nothing, but he actually tries to use that algorithm and tries to run on the historical data and then it will try to give you out some results to you, right? So I'm gonna say yes, process it. And there it goes.

It is now deploying, deploying means, you know, whatever understanding it is and whatever understanding he is getting from the historical data, he actually goes and stores it into the analysis services. So I'll say run the steward and that it is getting all the data it is processing No and everything done good. So, let us go and close this and let us try to view this data, right. So in order to view this data you can see there are some tabs given here at, you know, at the top here. So, as you know, so this is the mining structure and this is a mining model. We'll talk about both of these tabs in detail more later on, but at this moment, you know, I would like to go and view my model right I would like to go and view what has happened.

So, you can see there is something called as mining model viewer, so, I'm going to go and click on this mining model viewer. So, he says again, it appears to be out of date please deploy it, I will say Yes, go ahead and deploy it. There it is. And there is our mining model. Right, which you can see. Now, you can see that, you know, across all the screen you can see this word Model Model mining model you know, you can see the word model in a lot of places.

So, what exactly is a model? If you see in real world you know, let us say you want to build a big car, right the first thing we do is that you create a model and then that model gives you idea that how a real thing would look like. So, model is nothing but, but model is a thought process or thought process, you know, by looking at the thought process, you can know that how that thing would would would look you know, when it really happens or when it actually executes right. So, here again the model is the same thing. Now, let me elaborate again more on that model over here. If you see, at this moment, we have algorithm called as time series, and this time series algorithm is actually going and running on the historical data.

Now, the algorithm by itself is dead. It is just a logic. But once that logic goes through this historical data, right, so, when the logic goes through this historical data, he gets a thought process for Example, when the logic goes and runs over this, he says that okay, from 1000 it became 3000. So 2000 is a difference from 3000 it became fought for 4500. So, basically he looks at this data and he starts thinking something, he starts becoming mature, or he starts having a logic. So remember, algorithm is one part, but once that algorithm runs over the historical data, he gets a thought process.

And that thought process he uses to predict. So this model over here is nothing but your algorithm plus your historical data. Right. So basically, the, it actually tells you that how your algorithm has been trained. Remember, at the end of the day, this algorithm is like a dog actually, you know, it's a very sweet dog. You know, it, it has a, you know, it has a lot of technical things, but it only becomes intelligence, you know, when you start training him with this data, isn't it so when you train him with this data at that time, this model starts becoming intelligent, right?

So at this moment, what you see over here, this model is nothing but your logic plus the historical thought process. That is what the model, what you see over here, just quickly revising, you know what I talked in the past five past two, three minutes, because I think this is a very important thing, and I should repeat it, and I should impinge this thought process into your mind. So basically, what I'm saying is, you have a logic, you have a algorithm, and this algorithm gets trained by historical data. And the output of that algorithm and training the algorithm is nothing but the model. A lot of people also termed this as artificial intelligence, you know, where basically, you try to basically train algorithm and the algorithm start thinking like a human, right. So that's a very quick recap of what exactly is this word model here?

So, we can see now, first, when you see this screen over here, you can see down below we have the years 2001 2003 2005 and here we have the sales amount, this dark line here is nothing but the historical data. And this dotted line is what he is thinking, you know, how he will actually predict. So, we can see that this dark line is what is your historical data? So, we had historical data till 2006 right. So, this dark line here is till 2006. And after that, you know, he has drawn his thought process, right.

And you can see that we have something called as prediction steps, you can always go and increase the prediction steps, and you can predict for longer number of years, right. And also you can see there's a small checkbox here for division or deviation I'm sorry. So, if you click on this, you know, it just says that At what point you know, the algorithm started changing the trend, you know, or it started making moments. So even those You can see. So, basically over here now, what we are seeing is the thought process. Also, I would like to say something more about this training of algorithm, the algorithm gets trained from the historical data, right the historical data is the source of his knowledge.

So, if the historical data is bad, or if the historical data is not genuine, then the algorithm can go very, very bad. So, remember that the first step you know towards successful data mining is very, very good historical data, right. So, if it historical data is bad, then this algorithm is going to get trained in a very, very bad way. For example, now, let us say, my data is something like this look at it hundred then suddenly it becomes 20 then after that it becomes 500 and then it becomes 10 then it becomes 12 and then it becomes something like this right. Now, here if you see this, this data is completely hidden. Why right it is haphazard isn't it?

If you now try to go and just build your mining structure let us go and build a mining structure again let us see what happens. So, let us again go and deploy and let us see that what kind of thought process he brings in now. So, again let us go to the mining model viewer yes reload it. So, there it is now you can see you know when you see such kind of a sign you know very starts lying flat you know when he says that I'm going to lie flat they know why, because he says I'm not I'm not able to make sense out of your historical data, right. Remember, you know, when algorithm reads from historical data he tries to first detect a trend you know, like uptrend or downtrend, especially, you know, when you have such kind of time series algorithm, they try to detect a trend, but look at your past data some time it is down sometimes up then certainly low lying sometimes up.

So, you say No, you can see that now, he says, I am not sure you know how to predict this. So, when you see such kind of a flat line over here with this indicates that your model is actually not thinking he does not know what to do and you have a very, very bad model at place. Now, remember that this thought process has been deployed. So, when I actually you know, did the process mining structure, what it did is, you know, it actually stored this complete thought process into the analysis services. So, very quickly, let me just connect to my analysis services. And let me show you where do you see your data mining structure slow and you know what, by mistake I connected to tabular model I'm so sorry for that.

I need to connect to the multi dimension model and one more important point here. You know, this mining structure gets deployed, only interest multi dimension and not into tabular. Okay, so if I go here you can see this customer data mining. So in this you can see that there is your mining structure. And this is our mining model. And in order to go and query this mining model, we use something called as DMX, remember, MDX for multi dimension queries, and DMX to query data mining data mining instances, right?

So you can see that this thought process what you see here in your visual studio has been deployed here. And now SSRS can use this thought process and predict data and predict you know, the sales amount. Now, in order to go and fire DMX query, you can also do it from this Visual Studio over here or from the data tools. So you can see that we have the mining structure, we have the mining model. So the mining structure is where we go and process the Structure mining model is where we can go and see the structure and see what kind of inputs we have given for the model. And viewer is where we can go and see the model how it is working, how it is thinking.

And you can see there is something called as mining model prediction. So this is where you can go and run your DNS query. So you can see that there is a UI here. We'll talk more on this UI later on. By this moment, I can go and say your, my source here is I want to go and use the prediction function. And I want to go and predict.

Okay, I want to go and predict. So what do you want to go and predict, you can see that over here, there is something called a stable column reference. So I'll say I want to go and predict the sales amount. So I'm gonna go and drag and drop the sales amount over here. And then I can go and fire this query. So I can go here and say that basically now this is the query window and I can go and say result.

So the time said is as you can see that he has now predicted for 2007 in our data if you see at this moment in a we have the year till 2006 you can see that he has predicted for 2007 right. So basically this is this is this is happening because of DMX query. So basically I'm, I'm firing DMX and then I can go and also predict for other years for example, if you go and see this predict function over here, which we will see later on in detail. You can see that this predict function here includes the column reference and how many items you want. So in case you say here, let us say five. So it will say 2007 2008 2009 2010 and 2011.

Right, or you can use start and end as well. So, we will look into the DMX qualities later on separately as we saw the MDX queries, but for now, you know, understand that you have the mining model, the thought process and If you go and you file DMX query on it, and you say that Okay, tell me what will happen for 2010 tell me what will happen for 2012 by by looking at this mining model structure. So that brings us to the end of this 30 minutes video. And in this 30 minutes video, my goal was to introduce to you data mining, and how it is done in msbi. So in this video, we talked about algorithm we talked about how to train the algorithm, we talked about historical data, we talked about model, we also saw a simple example of time series algorithm, right. And we also saw what is DMX and how to make a query etc.

So we're going to go and continue more with data mining. So, what what is going to happen in the coming up labs, you will see SSIS you will see SSIS you will see SSRS and also you will see business intelligence that is data mining. So in the forthcoming lectures of data mining, I will cover more algorithms. So if you remember over here, you know we have other algorithms as well right. So my goal We to cover more two or three algorithms from this, you know, important ones. So our goal would be to at least cover almost all algorithms if possible, and understand data mining.

Well, thank you so much and keep learning and I hope that you like this learn msbi series. Please, please, please, if you like it, please do go to facebook.com slash questpond. And you and please do put your thoughts in what do you think about this series. Thank you so much.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.