Aggregation Commands

Introduction to PowerQuery Linking and Organizing Data
10 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€64.93
List Price:  €92.76
You save:  €27.83
£55.85
List Price:  £79.79
You save:  £23.94
CA$95.75
List Price:  CA$136.80
You save:  CA$41.04
A$105.88
List Price:  A$151.27
You save:  A$45.38
S$94.72
List Price:  S$135.33
You save:  S$40.60
HK$546.91
List Price:  HK$781.33
You save:  HK$234.42
CHF 63.46
List Price:  CHF 90.66
You save:  CHF 27.20
NOK kr757.20
List Price:  NOK kr1,081.77
You save:  NOK kr324.56
DKK kr484.44
List Price:  DKK kr692.10
You save:  DKK kr207.65
NZ$116.32
List Price:  NZ$166.18
You save:  NZ$49.85
د.إ257.06
List Price:  د.إ367.25
You save:  د.إ110.18
৳7,685.71
List Price:  ৳10,980.05
You save:  ৳3,294.34
₹5,844.92
List Price:  ₹8,350.25
You save:  ₹2,505.32
RM331.68
List Price:  RM473.85
You save:  RM142.17
₦99,352.90
List Price:  ₦141,938.80
You save:  ₦42,585.90
₨19,422.12
List Price:  ₨27,747.07
You save:  ₨8,324.95
฿2,569.35
List Price:  ฿3,670.66
You save:  ฿1,101.30
₺2,257.93
List Price:  ₺3,225.75
You save:  ₺967.82
B$359.91
List Price:  B$514.18
You save:  B$154.27
R1,286.96
List Price:  R1,838.60
You save:  R551.63
Лв127.01
List Price:  Лв181.45
You save:  Лв54.44
₩95,596.49
List Price:  ₩136,572.27
You save:  ₩40,975.78
₪260.69
List Price:  ₪372.44
You save:  ₪111.74
₱4,021.38
List Price:  ₱5,745.07
You save:  ₱1,723.69
¥10,900.73
List Price:  ¥15,573.14
You save:  ¥4,672.41
MX$1,172.87
List Price:  MX$1,675.60
You save:  MX$502.73
QR254.79
List Price:  QR364.01
You save:  QR109.21
P953.47
List Price:  P1,362.16
You save:  P408.69
KSh9,270.57
List Price:  KSh13,244.24
You save:  KSh3,973.67
E£3,317.36
List Price:  E£4,739.29
You save:  E£1,421.93
ብር4,002.19
List Price:  ብር5,717.66
You save:  ብር1,715.47
Kz58,536.37
List Price:  Kz83,626.97
You save:  Kz25,090.60
CLP$64,740.75
List Price:  CLP$92,490.75
You save:  CLP$27,750
CN¥505.65
List Price:  CN¥722.39
You save:  CN¥216.74
RD$4,053.41
List Price:  RD$5,790.83
You save:  RD$1,737.42
DA9,418.25
List Price:  DA13,455.22
You save:  DA4,036.97
FJ$158.92
List Price:  FJ$227.04
You save:  FJ$68.12
Q543.91
List Price:  Q777.05
You save:  Q233.14
GY$14,591
List Price:  GY$20,845.18
You save:  GY$6,254.17
ISK kr9,761.50
List Price:  ISK kr13,945.60
You save:  ISK kr4,184.10
DH700.89
List Price:  DH1,001.31
You save:  DH300.42
L1,242.28
List Price:  L1,774.76
You save:  L532.48
ден4,000.60
List Price:  ден5,715.39
You save:  ден1,714.79
MOP$561.14
List Price:  MOP$801.67
You save:  MOP$240.52
N$1,290.58
List Price:  N$1,843.76
You save:  N$553.18
C$2,565.92
List Price:  C$3,665.76
You save:  C$1,099.83
रु9,317.52
List Price:  रु13,311.31
You save:  रु3,993.79
S/259.50
List Price:  S/370.73
You save:  S/111.23
K270.29
List Price:  K386.14
You save:  K115.85
SAR262.49
List Price:  SAR375.01
You save:  SAR112.51
ZK1,908.05
List Price:  ZK2,725.90
You save:  ZK817.85
L323.12
List Price:  L461.62
You save:  L138.50
Kč1,619.42
List Price:  Kč2,313.56
You save:  Kč694.13
Ft25,184.29
List Price:  Ft35,979.10
You save:  Ft10,794.81
SEK kr758.43
List Price:  SEK kr1,083.53
You save:  SEK kr325.09
ARS$61,764.46
List Price:  ARS$88,238.72
You save:  ARS$26,474.26
Bs483.72
List Price:  Bs691.07
You save:  Bs207.34
COP$271,807.63
List Price:  COP$388,313.26
You save:  COP$116,505.63
₡35,677.52
List Price:  ₡50,970.07
You save:  ₡15,292.55
L1,718.38
List Price:  L2,454.93
You save:  L736.55
₲523,713.94
List Price:  ₲748,194.84
You save:  ₲224,480.90
$U2,690.81
List Price:  $U3,844.18
You save:  $U1,153.37
zł279.12
List Price:  zł398.76
You save:  zł119.64
Already have an account? Log In

Transcript

Much of the time when someone queries data, they're not necessarily looking at the nitty gritty details, but rather for a total an average, or some other type of aggregate summary data. Occasionally, you might even want a series of these totals. In databases. This is commonly done. So for talking about aggregations, why am I calling it group by there are two parts to aggregating values. The first is to identify the values that you want to aggregate.

And the second is to define the groups by which you want to aggregate. That's the term group by so let's do a small example by hand to demonstrate what I mean. In this small data set. I have only six rows account would tell me that there are six centuries since I have six rows, while a simple sum is 370. But if you look at the information in the table, I have different fields that I may want to calculate across. What if I wanted the number of unique nicknames in my table.

Well, I could create a different group by for each nickname, one for Philip, one for Katie and one for jack, I would count the number of rows that belong to each of Philip, Katie and jack to come up with two, two and two. Afterwards, if I needed it, I could sum the values field that belong to each person, this would be 50 for fill up 124, Katie and 200 for jack. In this scenario, I'm grouping my calculations based upon the entries in the nickname field. If I were to do this with the source type field, I would group the values into access and workbook groups. For these two fields, the counts would be four into while the sums would be 270 and 100. In this initial example, I didn't group by anything, I simply took the sum or the count of my data.

In the second scenario, I grouped by my nickname and I took the sum of value here and I took the count of the rows here in this third scenario I grouped by the source type and did The same sum and count but what if I want to group by multiple fields, I can group by nickname and source type simultaneously. This is a little bit trickier, as it looks at the combination of the nickname and source type fields to identify the unique sets. In this table, I have five combinations of nickname and source type out of six rows. I have filled access Philip workbook, Katie access, Katie workbook, and jack access. The last jack access is a duplicate. So if I were to group those two, it's one single group.

If I do the same sum and count approach across these group by than I would get 24 Philip access 34 Philip workbook 54 Katie access 74 Katie workbook and 204 jack access all four of these the Philip and the KT entries would be a count of one since we only have one entry for the jack access. It's a count of two because I have two entries that have jack And access is the nickname and source type. Now one final scenario we might consider is if I grouped by nickname source type and the values field. So just as grouping by the nickname and the source type generated five rows, grouping by all three of these fields would also create five rows, and it would create these five rows because this jack access 100 and this jack access 100 are the same, so they get grouped into a singular row.

So that's a quick overview of how group by works conceptually, so let's go apply it using our get and transform tool. The get and transform tool uses the same group by terminology while including a user wizard to assist in guiding you through the process of creating these aggregations. To demonstrate how it works, I've consolidated the datasets from our prior lessons, and I've added some new information. The data is within a table. So I'll go to my data tab and use the from table command to turn it into a query. Once I'm in the query editor, we can see that I have data ID, nickname source type and this values field, and I want to calculate a couple of different totals.

The first total, I want this, I simply want to know how many entries I have for each different source type. To determine this, I'm going to click on this group I button along my query editor ribbon to get myself a new window. The window will initially start with whichever column you had last selected within this group by section at the top. I want to make sure this says source type, which it does, but if I wanted to change it, I could click on the drop down and pick any of the fields within my query. Additionally, the wizard will default to a field named count. With the operation count rows selected without any specific column.

This operation will return a count representing the number of rows for each of the combinations defined by the group by area. In this case, since we're grouping by the source type, we will get the number of rows of data that each source type has Within the data set, which is what I'm looking for right now. So given that, I'll press OK. And after just a short piece of calculation, I can update a table. The table now lists each of my source types and the number of rows of data that applied to each. But now what if I wanted to total and the average of the values field? How would I go about getting that?

Well, first, I'm going to introduce another feature the Edit command feature. For those commands that utilize a secondary wizard, you can generally re edit the command in the window again, by going to the applied steps, identifying the row that you want to edit. In this case, I want to edit my grouped rows command and then clicking on the small gear button to the right, the gear button will not show up for all commands, but it does show up for many of the commands that use these secondary wizard interfaces. Thankfully, this group is one of the commands that does support this feature. So I can easily edit the command that I had just created within this bottom section here where it says count I can choose Add aggregation. By clicking this button, I get a new row with a blank name, a count rows operation and a blank column, I'm going to want to call this field total in the second box where it currently wants to count the number of rows, I'm going to click on the drop down and see a list of different aggregation functions that I could apply, I can get the sum the average, the minimum, the maximum, the median, or the count rows, or the count distinct rows, or the all rows will pass on the different row counts for now and focus on these aggregations that are simpler to use for this field.

I want the sum command, so I'll select it and upon selection, my column field immediately gets populated, most of these commands will require a field to operate on. In this case, a sum requires a field to add the components together. Since I want the sum of the value field, I'll click the drop down, and I'll choose the values but I also wanted an average. So I'm going to add another aggregation. I'm going to call it average value. I'll click on the drop down and Choose average for the operation.

And for the column all select values again, before we finish this by pressing OK, there are a few semi hidden commands for each row that might slip by if you weren't looking for them. Whenever your mouse is hovering over one of these rows, there's an ellipsis button that appears to the right, this ellipsis button includes three different commands in move up, a move down, and a delete command, each of which is fairly straightforward and a handy way of reorganizing data as part of the group by function. I could delete the count field here if I didn't want it. Now that I've gone through these commands, let's go ahead and press OK, and see what our results look like. I've now calculated the list of the totals and the averages of the value field for each of my source types. This type of result could be achieved in Excel with pivot tables or with some ifs and average ifs, however, the same information can be gathered quickly and easily within our query.

So far, this group is fairly simple with only a single group by field. This data set A trick to it though, is I've entered a number of duplicate entries, you could manually track down these duplicate entries and adjust for them. But I'm going to use the get and transform to quickly find them and adjust for them. To check for duplicates, I'm going to clear my group by and create a new one, I can check for duplicates, by adding the combination of fields to the group bisection. That includes all fields that should create a unique set. In this case, that would be my data ID, my nickname, my value and my source type field.

With this group by selection, I only want to count the number of rows so I can see where duplicates existed. So I'm going to leave my count and count rows operation. Now I'm going to press OK. Since I expected to have only one of each data ID nickname, data source and value combination. This will pair my data down to what I expect to be unique combinations and then tell me where any duplications may exist. To find a duplication, all I'm looking for are any combinations of entries that had more than one entry, so any count of two or more.

So for these sections that have counts of two or more that indicates that there was a duplication in my base data. However, let's think about what we've done with the group by across our fields. The entry 24, fill up 33. Local has a count of two, which tells me that I previously had two entries. But by grouping all four fields, I now only have one entry. This means that I can add a second group by on top of my previous group by effectively grouping this data to calculate my query results.

Since our previous group by calculated the totals in results across duplications, I'm going to add a new group by that does it on this already grouped by data, I'm going to do the same thing I did before where I group by source type, and then I'm going to add the sum as the total, and then the average value is the average. All in my query steps. You can see now that I have two groups rows command, the first one gets rid of duplicates, and the second one calculates on the grouped by unique values. In most real world scenarios, you'll still want to go back to your original data source, and identify and correct where the duplications occurred there. However, this tool does make it very easy to quickly identify where those duplications exist while still proceeding onwards to generate a usable result for that upcoming management meeting tomorrow.

With that, you should now be able to use the group by command to do some fairly interesting things. The command is incredibly useful for generating aggregations based upon a whole wide variety of filters, then evaluating data for valuable information and it can be stacked multiple times within a single query to generate some really powerful results.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.