Lab 38 :- Aggregations in SSAS.(SSAS)

MSBI Step by Step Training Lab 38 :- Aggregations in SSAS.(SSAS)
49 minutes
Share the link to this page
Copied
  Completed
In this video we will discuss a very important and tough topic called as Aggregations.

Transcript

In this video, we will discuss a very important topic and a very tough topic termed as aggregations. So, if you look at this SSA as tabular, this tab is the person who is responsible to do aggregation. Whenever you have performance problems in SSIS, your cube is loading slow, your data is fetching slow this is the first tab you will always visit. So, let us first try to understand the English meaning of aggregation. Aggregation means something that is calculated by using several other elements means what? For example, look at this cube if you see in this cube let me go and clear all result.

So, in this cube I can go and display the sales amount, right. I can go and see the sales amount, day twice. I can go and see the sales Mt month wise and I can go and see the sales amount year wise. So, if you see here this year adds up or aggregates value from the month and the month aggregates value from the date. So, aggregation means, you know doing summation or or average by using several other elements. So, if you see at this moment especially in this tree hierarchy over here of time dimension, you can see that the date is added up to the month and the months are added up to the to the year right.

So, this is termed as aggregation. So, in short you know dimensions you know which have hierarchies will have aggregation. So, if you look at this time dimension over here, if I go to the time dimension, you can see the hierarchy over here which says that the year is calculated from half year half year calculates from month and the month calculates from the date. Remember this time dimension we had already explained in one of the previous part of the video that is in lab 10. So in lab 10 if you're if you remember, this SSIS time series was discussed, so this hierarchy was created automatically by the SSIS template. So if you want to go and see the hierarchy of how the aggregation is happening, you can go to the dimension structure, and you can go to the attribute relationship.

And over here you can know that how every dimension is calculated from the Child Elements. And also if you go to the dimension structure over here, you can see this small dots over here which also indicates hierarchy. So this says that this four dots here indicates that this is the last leg of the element. The month is the parent of the date, half year is the parent of the month and the year is the parent of everything so the year is calculated The aggregation of the half year half year users the month for calculation and the month users the date. So, basically if you want to go and see the relationship you can go to the attribute relationship or you can see the dimension structure to see that how the hierarchies are structured. Now, these hierarchical aggregations are calculated during runtime.

So, for example, in a when an end user comes and he browses a report, let us say, So, let us say he is browsing year 2008. So, what SSRS does internally it goes and says that, okay, inside the year 2008, how many months we have, so we say okay, we have January and March, so every individual month then goes to the dates and calculates the total, then the total is rolled up to the months and the months total is rolled up to the year. And this whole thing happens during the runtime. So now think about that. If you have a lot of lot of records, let's say you have millions of Records. And if everything is happening on runtime that can make your application really, really slow.

So if you can just think out of the box, you know, let us say that if we can pre calculate these calculations, rather than doing on runtime that would really improve, improve the performance of our SSIS cubes, right? So rather than doing these calculations during runtime, if we can pre calculate the pre calculate them while we process the cube, then that would really make our application fast. And that's what exactly aggregation does. Aggregation actually goes and stores the pre calculation of each one of these aggregations. So this tab over here is nothing but you know, it's the tab where you go and define saying that okay, I want to do aggregation at the year level. I want to do aggregation at the month level.

I want to do aggregation at the week level. So you go and you define the aggregations over here, and when you go and process the cube, you know, these aggregations are pre calculated or they are calculated while the cube is processed. But now, before we go and create aggregations, let us try to understand that how does SSRS queries perform when we do not have aggregations? So, let us go and write a query, which will go and select. So, I'm going to go and write a query which will select the year we'll select the sales amount on the year level, okay. So select the time dot here on columns from sales OLAP.

So you can see that here we have a very simple query which actually returns the total sales of the year. Okay, so let's Now go and fire the SQL Server profiler. And let us go and your to Analysis Services. So I'm going to go and here to Analysis Services. And let us see. So, let me clear the trace window and let me execute the query.

And let me quickly go back to the analysis services. And you can see that there are a couple of things here. So, you can see he has displayed some data over here, so we can see that he has displayed the query you know, which is fired, that is select time dot here, right. So now just concentrate you know, on this area, so just concentrate from the query begin to the query and forget about the other events, you know, those are XML queries and they are fired for some other purpose. But at this moment, our concentration is only on These things, you know what are seen in the profiler. Now the first thing is, the way to read the profiler data is you can see that there is a main event and there is a child event or I'll say it's a subclass event.

So this is the main event. And this supplies even tells you more details about the main event. For example, here the first event what you see here is a query begin event. So there is a main type. And the subtype says that what kind of query it is, is it a MDX query? Is it an XML query?

Is it a DMX query okay? And what kind of query it is, so you can see this query says that this is a query which actually fetches the year values from the from the from the OLAP. So this query begin event and the query end event tells you the start and the finish of the events. Okay, so at this moment, our goal is from here till here. The second type of event we have is a progress event. The progress event tells you more details about you know, from where the query got executed, how the query got excluded.

So you're the selling telling that the progress report event is saying you that the data was read from the fact sales partition. And then we have the query subgroup event. The query subgroup event tells you that what dimensions were read okay. So, we can see other queries of currencies that you know I am reading from a non cash data non cash it I mean, this is not a cache data it is read from the hard disk and at this moment, you know, we are not reading any kind of dimension values dimension values means, if you see over here, for example, now, this is the age range you can see over here I'm gonna go and move this over here. So, dimension values means that it is not reading this 20 to 3040 to 50. So, in other words it is not reading any direct member values for example country name so, it is not reading India Nepal etc okay first thing let us try to understand that what these 00 or the 00 indicates the the dimensions you know for example, you can see at this moment I have h dimension It has two members inside it or I'll say it has two properties inside it or to measure two properties inside it one is age range and age ID.

So, this to zero indicates the dem age with the 00 look at the second one country. Here also I have country name and country ID so, again 00 look at product ID product name again Oh 00 after that I have date. Okay. So date has 1-234-567-8910 11 so 1-234-567-8910 11 So, basically, again this indicates the dimensions and then the salesperson ID again. So, each one of these zeros indicates you know that each one of these properties here and what it is saying at this moment it has not selected any one of these properties. For example, now, let us say I go and clear all this cache data and let me run this and let me go and see that such the value of 2001 Okay, you can see now, I'm not saying time dot here, actually, I am saying time dot year dot for 2001.

And I'll say execute Okay, look at this. We'll pause here. So we can see now look at that one. Look at that one over there. Right? That one indicates that basically, we're used the year.

Okay, the one indicates that you have used in one of the dimension values we have fetched, so you're not fetched only the parent, but you're fresh in one of the dimension values. Okay, so this indicates that the dimension value was fetched. So basically, I just wanted to start off with remember three events. One is the query beginning event, and the query end event. Second one is a progress report event and then the query sub goobie sub event, which tells you that is it reading from the cache? And second, is it reading the direct values like you're about 2001 country, not India, right.

Good sir. So, the point where is that at this moment you know, it is not not reading data from aggregations it is reading data directly. So, in other words you know, when I go and see your time dot year all the dates value was added to the months and run this. So, all the beads value so, if you see at this moment This indicates that when you say time dot here It indicates that when you say that you are reading directly from the from the from the from the organ good you know you can see that this is nice, you can see now, I do not see the progress begin event and the progress and the event. Right Why can you guess because is reading data from the cache. If it is reading data from the cache, that means it does not need to go and make a hard district.

So that's why you can see now in this case, I do not see any progress begin event, and the progress and event and what I will do is in order to just remove this cache, because at one moment of time, we just want to make sure that, you know, we clear the cache right? So that we can see the events. So what I'm gonna do is I'm going to use this XML a command, what you see here, so I'm going to go and fire this XML a command. So Object Explorer, remember, XML command if you want to do any kind of admin activity like clearing the cache, backing up the cube, etc, right? So I'm going to go and fire the XML command here and clear the cache. So you can see this XML command is a very easy Standard XML.

So, once you have clear the cache now, let us again go and file the query. So, if you see again now, you can see that the progress begin and approvers can be seen, and again that non cash it I seen, and again if I fire because once it is read it is again cached. So, again if a fire the query, you can see that, again is unique from the cache data. And I'm not seeing the progress begin and the end event right. So, again let me go and clear my cache. Let me start the profiler.

And let me again do a select and then it does so again I see the query evens, right so, basically This progress begin event will be only seen if you are not reading data from the cache the time you are reading data from the cache, you know, it is not needed right it will be read in memory. So, two three points to remember, we talked about three events query begin progress and query sub cube, the query begin event at the query end marks the start and the end. The query sub cube event tells you that if you have selected the direct dimension values and the progress report events actually tells you that from where the data is read, so, at this moment concentrate on the progress report begin and the end event. So your at this moment what we are seeing here is it is actually reading data from the fact sales partition and we do not want this because if he's reading data from those facts in partition, it is building everything from scratch.

It is totaling up the dates to the months, the months of the year and so on right. So let us go ahead and create a partition. And let us see that you know if it makes difference in this progress report begin and end right. So, let us start creating the partitions. So, let us go here. Now, remember that creating a partition is art.

So, basically you have to know that which of the dimensions actually need aggregations for example, in this case, if you see my age, my country my product, my salesperson, all of these dimensions do not need aggregations at all you know if you see at this moment, the time dimension is one of the one of those people you know who has the hierarchy. So, there is a high probability of aggregations in this dimension right. So, time dimension is what I should be concentrating on for creating part for creating aggregations. So, I will go here and I will say that I want to go and start creating an aggregation. So, you can see here I have moved my mouse and it says design aggregation. So, let us click on it.

The timesheet design aggregation it starts about wizard So, I'll say yes next Ok. Now we can see that there are a lot of lot of things over here you know, there are lots of lots of first thing you see the dimensions and then you see a lot of options over here called as aggregation usage, okay. At this moment what I will do is I will select the default okay. So I will select the default so default means that let the aggregation for the aggregation visa decide, you know what is the best option? Okay. I will talk about these three usage later on. But at this moment, let's just go and select the default thing and let us say that okay you go and decide, and later on I will go and review this.

Okay. So I'm going to go and set all to default and I will say next. The next thing what it says is that you know, can you please give the total number of records or total number of count of data or do you have because the more the amount of data then the aggregate is more needed, right? So either you can go and you can edit this value and you can put your hundred thousand 10,000, whatever it is, or either you can use this count over here to get the, the total count, right. So you can see I've just clicked on the count and it gives me the value, okay? In case you have a large data production in a way you do not want to go and fire this count, then you can go and type the approximate value over here.

So I'll say next, right. Now, this is a very important section. Now when you create aggregations, right, it actually takes a lot of hard disk space. So when you create aggregations, definitely it needs some extra more hard disk space to store that aggregated value, right. So the first thing you would like to say that Okay, do not take lots of space and you can give you can see here, okay? Until the space becomes one GB till then you know go and calculate whatever aggregations you can calculate okay.

The second one is you'll see that C hardness is not important because hardness is very cheap. So, I do not want that my heart is to limit my aggregation then you can go and say okay like saying okay until the performance gains gains reaches 20% or 10% please go and create as many as aggregations you can think about Okay. Now, remember that this performance gains should not be confused over here like saying that okay, if I put hundred percent that means, you know, it will become double don't do that, okay. So, this performance gains you know, just says that basically it is just a number you know, which says that, how of performance you want to gain and normally, you can have a 10% or 20% performance gain and not like hundred percent right. So, put a value between 10% to 30%. So, if you put 100% then it will try to calculate as many as aggregations as possible.

Okay. So, that's What that hundred percent means, but this hundred percent does not mean that your performance will increase hundred percent. Okay. And the next thing is you can say, okay, go and start. And until a stop, you know, figure out whatever aggregations you can. So I will give you a lot of time, and you go and figure out a lot of aggregations, whatever aggregations you can.

So I will take the first one here, and I will say, until this reaches 100 Mb, please go and start calculating as many as aggregations you can think about. So, when I click on start here, it starts going to your dimensions to your fact data, and it starts thinking that what kind of possible aggregations it can calculate remember, when I started the resolve as I selected default, so default means that I'm telling to the aggregation wizard that you go and figure out depending on the dimension type, depending on the hierarchy that which are the possible aggregations you want to do right. So you can see it has said that I have you know 22 aggregations have been designed and it looks like you will have a 32% increase in your performance and you know it's going to acquire 3.1 kV. Now remember that at this moment I have very less records so it does take it is saying 3.1 kV, but when you have millions and trillions of record, definitely this will go in GB, right.

So that's fine. I will say next. And I will say just save the aggregation for now, but do not processed them. Okay, I would go and say finish. So I can see now that he's saying that I've created 22 aggregation and the partition size is 86. And at this moment, I just have one the one partition, so this aggregation design has already been applied to that partition.

You can see here, this is my partition, if you remember in one of the previous videos, Suki So, had talked about what exactly are partitions. So, you can see that this partition here has been applied this aggregation design, right. So 22 aggregations have been created, what I would like to know that, what aggregations Have you thought about, did you think about creating aggregations on the country, did you think about creating aggregations on the product on the year right. So, for that, you can see that there is a standard view and there is a detail view, so, this is a standard view at this moment. So, let us go and click on the detail view or the advanced view. So, that we have the advanced view and larger So, you can see these are all partitions a zero a one a two a three a four, you know till a 21 So, that is we have 22 aggregation something right.

So, that is those 22 aggregations right. And here it says that it has it has aggregated on the ID. It has aggregated on the country ID. So you can see here, he did not aggregate on the age range and country name. Why? Because these are text fields, right?

So on text fields, there is no aggregation. Right. So he has just used the numeric values to create aggregations, right. So you can see that on the on the product on the year, right, so it has used aggregations everywhere. So he has put aggregations everywhere. Okay, good.

What I will do is, you know, at this moment, I'm not really interested to do aggregations on the age, right, so I can go and check all these. Right? I'm not interested to do on the country ID. So basically those dimension values where you see I'm not interested to do aggregations, right. And you can happily go and delete aggregations from here as well. So you can go here and delete this aggregation.

Can See, I'm selecting very quickly. I can go and delete. Why do you think 15 so you have the ability to go and delete the aggregation from here, okay, you have to go and expand his properties and you can see this correction corrected symbol, you can right click and say delete. So you can go as well and delete some of these aggregations which are unnecessary. Remember that unnecessary aggregations have come up because I selected default. So if you control that aggregation usage, you know, you won't see these kind of unnecessary aggregation.

So we'll do that in the next phase. So let me at this moment you know, go and just build this aggregation what I've got here, so I'm going to go and do my cube. Right, yes. Slow. So basically, I've used the default mode, and I have created the aggregations. And now let us go and fire or XML a query.

And let's see that what difference do we see in our SQL Server profiler? Right? So I'm gonna go first thing and clear the cache because if it is reading from the cache, then I won't see any aggregations. I wouldn't know from where he's reading the data right. Now, let us go back here. Let's clear clear everything and let us go and execute this MDX query.

Right. And let us see. Interesting like can see it is seeing that I'm reading not from the fact sales But from the aggregation seven right. So, what is aggregation seven if you remember aggregation seven so aggregation seven is this aggregation seven is seven. So, it is seven. So, basically what it is saying is that I think in the seven it is the year.

So, you can see that he has used his seven aggregation to read the value. So, what I mean is that at this moment it is actually going and reading from the aggregation This signifies that your aggregation have been used right. So, in the progress report if you see you are reading from the fact sales, right. So, that means it is not using aggregations, but if you see the word aggregation and also the aggregation from where it reads the value you will come to know that he's using the aggregation So, this says that yes, it is not calculating things on the fly. But now, these aggregations you know, what you see over years you know, they are they are completely unmanaged you know, we just use the default and we created it and the aggregation designer just went crazy and it did whatever it wish to do right.

So, definitely This is not the proper way of doing aggregations right. So, let us go back again to the default view and let us delete this aggregation okay. And let us create with more in a more controlled way in a more proper way right. So, let us again go and click on design aggregation. And at this moment and in this in this case, what we will do is, we will think about that, what is the best choice for us. So, what I will say, you know, that I do not intend to create any aggregations on the age on the country, on the salesperson So, you can see that I've selected none over here.

So this nun indicates that aggregations will not be created on these dimensions, because the time dimensions is where you know, I would like the aggregations to be created okay. So, the first one was default The second is none right. The third one is full the full says that for example, if I go and select year over year right, if we go and it will create aggregations for the month it will go and it will create aggregations for the week. So, whatever is that you know, top and bottom of the hierarchy will go and create aggregations every you know, to the full level okay or it will it will try to use this aggregation in every one of those facts, okay. So this full means that basically it will it will take end to end in a variable this dimension is used to create an aggregation, but definitely, I don't want that because It is very much possible that a lot of people are not quitting on the month.

A lot of people are not quitting on the week, right? So remember that if you're selecting full that means it's going to acquire a lot of hard disk space okay. And then we have unrestricted, this unrestricted says that we are telling to the aggregation designer that you decide what is the best. So, you decide that basically which aggregations are the best whatever it is your thought your mind use that So, when I say unrestricted, I am saying that I'm not restricting the aggregation design wizard. Now, the next question which will come to your mind is that So, what is the difference between default and unrestricted, right? So, very quickly, none is None.

None is very easy to understand. Don't create aggregations on these dimensions. Full means use the full tree hierarchy, you know, and create the aggregations at all levels for that dimension. Default and under student says that Returning to the aggregation wizard that you decide, you know, what is the best thing for this dimension. But now, the question is that what is the difference between default and unrestricted? The default is more sophisticated as compared to unrestricted unrestricted means, that we are telling to the aggregation design, there are no rules I am giving you full right to do whatever you want to do.

But in default, you know, we have four rules here we say that first thing the default actually will decide that basically should he go into a full mode for dimension should he go into unrestricted mode should it go into a non mode right. So, there is there are some rules on which he decides, for example, if he sees any many to many relationship, and if he sees any special dimension types, he actually suggests to none, right. If you see a natural hierarchy at this moment, if you see our our time dimension is a natural hierarchy. Natural hierarchy means you know you have a year, then you have a month, then you have a week then you have dates. So, you have such kind of a natural hierarchy, then it will switch to unrestricted and orals we switch to none. So, basically here we have certain rules, you know, that are applied by the wizard.

So, the wizard your is governed by rules while unrestricted, it is not governed by any kind of rules. So, you can think about the default decides to go to unrestricted to go to none or to go to full depending on these four rules over here. So, first thing is implied to go to unrestricted, second is, if you find any special dimension types, he will go to none of them sorry, if you find any natural hierarchy, he will go to unrestricted. So, basically let us go back to the wizard. Yes, sorry. So you can see her and this woman, I'm saying For the time dimension, I'm telling you follow certain rules don't go crazy like without any rules.

So, follow your rules and I am selecting them as default definitely because this is a natural hierarchy, it will actually try to fall to unrestricted right. So, let me do a next again account on next and let the system so, again at this moment, let us see that what aggregation it is creating and you can see that he has designed three aggregations So, this is much better right. If you remember, previously we had 22 aggregation held right. So, because we just set default to everyone and it was crazy. So, now with three aggregations you know, I feel that we will have very better and logical design over here. So, let us go Go to Advanced view and let us see, so, first thing definitely for age I don't see any aggregation for country I don't see any aggregation for sales person, I don't see an aggregation.

So this is much better, but let us look at time. So for time he has created aggregation on year and half year. And I think it's absolutely fine. You can see that he has not done aggregation on the month, he has not done aggregation on the child's you know, and that is fine, you know, because if you just aggregate the top level, and if you even query the month will try to use this aggregation, the year aggregation. So, remember that, if the top level has aggregated then even though and if you go and query the child nodes of the aggregation he will use the aggregation still, so, you don't have to go and do aggregation at all levels. You are you are you're increasing your heart displays unnecessarily right.

So, this is a better thing. Let and I feel that this is a better design. So, let me go and build my cube structure. billis So, let us go back to a profiler, let us start hearing from the profiler, let us say Edit clear trace window. Let us create a cache because we want to see all the events. And let us execute this.

And let's see what we see. And there it is. So you can see now, it is reading from zero aggregation so it can see that it's actually reading from aggregations, right. Great. So that's again, I can see that my aggregation has been has been used. Now remember that if you're reading from the cache, for example, let us say now I go and I find out The Video Game.

See, you can see now I don't see it is not reading from the aggregation right it is actually reading from the cache. Okay, so you can see now, once you run the query, it is very much possible that you can cache once you catch us it is not reading from the hard disk he's not reading from the aggregations. He's actually reading from the cache, right. Good. Now, I would like to highlight one very important thing, a lot of times when people do these demonstrations, they do not see the aggregation in that profiler. So one of the reasons of not seeing the aggregation would be the default value.

If you see as SSE as you know, does not do aggregation until it crosses the 4000 record limit. So if you do not have 4000 Records, you will not see this aggregation value. And if you see at this moment, I hardly have any recourse to live seeing it. So what I've done is I have gone ahead and Have overridden. So you can see this file over here, which is in C program files, Microsoft SQL Server, Ms s 10 dot 2008 OLAP slash config, right? So go and open this file, so I'm going to go to the properties.

So, you can see this path over here, right and just copy this path, so, you can see it. So, basically you have to go to this path and change the setting. Now, remember that if you have 2012 or if you have 2014, then you have to see according to the versions, so you have to go here to this folder. So let me go to this folder here Ctrl R and in that you will find that there is one config file called as Ms. md SRV. So you have to go and edit this config file. And in that we have one setting which is the index build threshold.

So you can see I've given the value as 10. So in other words, you know, if I've just records more than 10 I should be you know use aggregations. So, over here this index build threshold is 4000 or something. So, there are some X amount of records. So, if you are not having that many records, the aggregation will not be seen. So, please ensure that you go to the config file and change this value to see the aggregations and definitely restart your SSL as a service because or else these values will not be picked up.

Great. Now, if you see at this moment, you know you have done aggregation by using the design mode. So, we went to the design mode we took the help of the SSIS wizard. We also thought logically what to apply and what not to apply and we created the aggregation but if you ask me real time in real time How to go and create aggregation that I should know know the actual workload. If I know the actual workload, if I know that my users are firing lot of queries on the time dot here or time dot month, then I would create aggregations accordingly. So to start off with when you're actually going to production first time, you know you're launching a project, this design mode is good, but then later on, you know when the application starts and it is running in production, you if you can go and measure the workload if you can go and say okay, there are 10,000 queries getting fired.

And from this 10,000 queries, you know, five queries are firing on dim time fire queries are firing or dim country. So and then later looking at the load if you want to go and fine tune your aggregation, then this is where we have the second option your callers usage based optimization optimization. So first one is designed based, you know, must be this is the first phase you can start with, but later on to fine tune us the uses based optimation optimization, right? So what I would do is I'm going to go and delete this aggregation at this moment. And I will click on the usage based optimization. But before I do that, we have to capture the workload somewhere, right?

We have to say that okay, 10,000 queries are getting fired. And we do somewhere define the workload, right? So, to capture the workload, you need to go and do certain things. So you need to go to your SSIS, that is your Microsoft Analysis Services, and go to the properties. And over here, I've just reset everything to default. There is something called as query log, right?

This query log is nothing but you know it, you know, on this SSRS, whichever queries are getting fired, it will help you to log those queries into a table, right? So you can go here and see enable this as true, right? And the connection is so I will go and set the connection to my same database that is sales OLAP. So if you see at this moment my database is let me quickly check that my databases, dw OLAP. So sales dw OLAP Okay, so we'll go here and say please use sales dw OLAP. Right.

And he's saying that what is the sampling rate do the sample after 10 records or two records are one record from which is said to in a sing that okay keep sampling and I will see okay. So, the time I do that what it does is in the sales dw OLAP it goes and it creates a table called as OLAP query log. So, we can see that there is a table here created called as OLAP query log. Now, what happens is, you know, when I go Fire any query. So if I won't fire this query, let us say one or two times, right? This query gets logged into this OLAP query log here.

So, if I go and see now, this OLAP query log, you will see that it has, you know, it is showing basically what kind of queries have been fired. Now, remember that I've been you have this OLAP query log was created a long time back. So, you can see that it is actually showing a lot of records. But you can see that it is saying here that this analysis services project file on that somebody has fired the query on the facts unstable, right. And that is that right? So basically, if I go and find one or two times more, I should see the records increasing.

Right, so that if there's one more record is seen, so this is OLAP query log captures the queries of MDX which are fired on your SOC s. So once I have the load once I Know that okay these are the queries. Now I can go to my usage based optimization and I can say now yes say next. And I will say that Okay. I will give a beginning and the end date. So I will say that use today's date, because I captured some of the query logs today, right? Oh, there's some problem of let me take 17 for now, right.

So whatever is the beginning and end date I've taken also you can run define saying that that, you know, basically if you want to just say that, okay, if user wise if you want to go and select the query log, you can do that. Or you can say that, okay, don't take all the queries just take the queries which are used 50% times or which are used the most times, you can again, use this option button here. So I'll just say next I don't know why it was from start here. Why isn't it showing? Okay? So I don't know why it is not showing the queries here.

Let's see, I have some queries which are strange. For some reason, my query log, remembering that we have given that sampling rate, so I'm not sure that because of the sampling rate Because not logging finish this file multiple times. For some reason these queries are not getting captured it looks like let us go and review our sampling rate. Okay. Must be able to find the final sample rate query log. I think the sampling rate is very low.

This just went back to the default value for some oh there it is 22 Now I can see. So, I think my sampling rate was less right Good. So I can see now some records have 22 great. So let me go and use this 2222 is actually today's date. So let me now see if I can just take the value of 22. So that's something that was a problem.

Let me see if I can take 22 value next Okay. Next now that it is fine. So, basically no it is my mistake. So again, you can see the same result is now starting in again the count. Again, all these things are same, but at this moment, what it will do is it will actually use your query log log to figure out that what is the most used aggregation? And you can see he has just designed only one aggregation and I hope that he just does the aggregation on the year.

Okay, I hope so. So He has designed one aggregation and I hope. I'm not sure what he should have shown me on the year. I'm not sure why he why he has not shown. But yeah, sure he should have shown me on the year. The moment that I had this moment, you know, I have hardly any records.

I have hardly any data, right, I did not fire large number of queries. So that's why I don't see any of the aggregation picked here. But what would happen in real time, I would have seen that you would have created this aggregation on the year. So, there are two ways of doing aggregations one is that you can go and you can use your design aggregation and the other one is you can go and use the query log or the usage based optimization. So, let us quickly discuss you know, what are the best practices for aggregations? First one, start with design design based aggregation approach.

And then finally go and start using query long approach you know when you are in production and when you are in life, so every like one month or two months or six months in or try to, you know, review your aggregation by using the query log. Second thing, you know, use the aggregation usage very carefully. So, if you see that there are a lot of MDX queries which are using a certain dimension, then go and use the full if you see that some of the dimensions look very straightforward and you do not need aggregations then ensure that you make them none and last one is you know use default you know, in case you are not sure, right. So, basically useful You know, when you know that this aggregation is used like anything in the project Left and Right, right. So, useful exclude dimensions you know, which are not needed for aggregations, and then use default, unrestricted is very bad because unrestricted.

What it does is, there is no thought process behind it right, it will just go and say that, okay, if the data type is numeric, then do that. negation. If it is not numeric, then don't do the aggregation. So I feel that unrestricted is to lose. You know, for aggregations, I would suggest that at least you know, follow those four rules, you know to create the default aggregations. So these are some of the best practices I've talked about about aggregations.

So that brings us to the end of this one hour video in this one will video we saw that what exactly are aggregations you know, how it can improve this year's performance. And then we saw the two ways of doing aggregations one is query based one is designed based, and we also saw something called as the aggregation usage, you know, which you can go and fine tune that which dimensions to select. Thank you so much.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.