2.1 Data Cleansing

Alteryx Essentials Data Preparation
4 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$49.99
List Price:  $69.99
You save:  $20
€46.42
List Price:  €64.99
You save:  €18.57
£39.83
List Price:  £55.77
You save:  £15.93
CA$68.34
List Price:  CA$95.68
You save:  CA$27.34
A$75.68
List Price:  A$105.96
You save:  A$30.28
S$67.48
List Price:  S$94.48
You save:  S$27
HK$390.53
List Price:  HK$546.78
You save:  HK$156.24
CHF 45.23
List Price:  CHF 63.33
You save:  CHF 18.09
NOK kr543.62
List Price:  NOK kr761.11
You save:  NOK kr217.49
DKK kr346.42
List Price:  DKK kr485.02
You save:  DKK kr138.59
NZ$83.17
List Price:  NZ$116.44
You save:  NZ$33.27
د.إ183.60
List Price:  د.إ257.06
You save:  د.إ73.45
৳5,471.12
List Price:  ৳7,660.01
You save:  ৳2,188.88
₹4,168.17
List Price:  ₹5,835.78
You save:  ₹1,667.60
RM236.95
List Price:  RM331.75
You save:  RM94.80
₦61,737.65
List Price:  ₦86,437.65
You save:  ₦24,700
₨13,922.21
List Price:  ₨19,492.21
You save:  ₨5,570
฿1,837.56
List Price:  ฿2,572.74
You save:  ฿735.17
₺1,617.36
List Price:  ₺2,264.43
You save:  ₺647.07
B$253.57
List Price:  B$355.02
You save:  B$101.45
R925.26
List Price:  R1,295.44
You save:  R370.18
Лв90.75
List Price:  Лв127.05
You save:  Лв36.30
₩67,788.68
List Price:  ₩94,909.58
You save:  ₩27,120.90
₪185.81
List Price:  ₪260.15
You save:  ₪74.34
₱2,852.60
List Price:  ₱3,993.87
You save:  ₱1,141.27
¥7,651.71
List Price:  ¥10,713.01
You save:  ¥3,061.30
MX$848.33
List Price:  MX$1,187.73
You save:  MX$339.40
QR182.01
List Price:  QR254.83
You save:  QR72.82
P679.12
List Price:  P950.82
You save:  P271.70
KSh6,605.16
List Price:  KSh9,247.76
You save:  KSh2,642.59
E£2,394.23
List Price:  E£3,352.12
You save:  E£957.88
ብር2,861.57
List Price:  ብር4,006.43
You save:  ብር1,144.85
Kz41,791.64
List Price:  Kz58,511.64
You save:  Kz16,720
CLP$47,104.79
List Price:  CLP$65,950.47
You save:  CLP$18,845.68
CN¥361.79
List Price:  CN¥506.53
You save:  CN¥144.74
RD$2,896.80
List Price:  RD$4,055.76
You save:  RD$1,158.95
DA6,728.30
List Price:  DA9,420.16
You save:  DA2,691.86
FJ$112.64
List Price:  FJ$157.70
You save:  FJ$45.06
Q387.49
List Price:  Q542.52
You save:  Q155.02
GY$10,429.06
List Price:  GY$14,601.52
You save:  GY$4,172.46
ISK kr6,980.60
List Price:  ISK kr9,773.40
You save:  ISK kr2,792.80
DH502.81
List Price:  DH703.98
You save:  DH201.16
L883.05
List Price:  L1,236.34
You save:  L353.29
ден2,855.74
List Price:  ден3,998.26
You save:  ден1,142.52
MOP$401.24
List Price:  MOP$561.77
You save:  MOP$160.53
N$922.79
List Price:  N$1,291.99
You save:  N$369.19
C$1,835.15
List Price:  C$2,569.36
You save:  C$734.20
रु6,656.11
List Price:  रु9,319.09
You save:  रु2,662.97
S/186.09
List Price:  S/260.54
You save:  S/74.45
K192.70
List Price:  K269.79
You save:  K77.09
SAR187.48
List Price:  SAR262.50
You save:  SAR75.01
ZK1,344.69
List Price:  ZK1,882.68
You save:  ZK537.98
L230.99
List Price:  L323.40
You save:  L92.41
Kč1,163.34
List Price:  Kč1,628.77
You save:  Kč465.43
Ft18,094.88
List Price:  Ft25,334.28
You save:  Ft7,239.40
SEK kr539.27
List Price:  SEK kr755.02
You save:  SEK kr215.75
ARS$43,903.33
List Price:  ARS$61,468.17
You save:  ARS$17,564.84
Bs345.22
List Price:  Bs483.33
You save:  Bs138.11
COP$194,164.52
List Price:  COP$271,845.87
You save:  COP$77,681.34
₡25,478.72
List Price:  ₡35,672.25
You save:  ₡10,193.53
L1,231.47
List Price:  L1,724.16
You save:  L492.69
₲373,200.63
List Price:  ₲522,510.75
You save:  ₲149,310.11
$U1,910.59
List Price:  $U2,674.97
You save:  $U764.38
zł200.97
List Price:  zł281.37
You save:  zł80.40
Already have an account? Log In

Transcript

The data cleansing tool replaces and removes inconsistent or improperly formatted data in your inputs. I know that sounds a bit abstract, so let's go through it with an example. As you can see with the illustration on the right, the first picture shows in red boxes, all the anomalies we have with our data set. employee ID has random white spaces between the numbers. First name has punctuation issues, h has a couple of random tabs, and favorite coffee has two null values. What we're going to do in this exercise is import our example HR sheet, Dragon Age a data cleansing tool to clean it, and then run our workflow to view the results of our cleaned data set.

Let's start a new workflow by importing spreadsheet 2.1. We'll go to the input data tool and connect to our spreadsheet. In here we have our example HR five from chapter one, but this time it needs to be cleansed. To get a better view of our data will add a browse tool to our input data tool by using the following keyboard shortcut Ctrl Shift V will run our workflow with Ctrl R. And we can see at the bottom in our preview pane that with employee ID, there's a couple of white spaces. First name has punctuation issues. Age has a couple of random tabs, and under favorite coffee, there are two null values.

A quick way to identify issues with our data set is to check the color of each column. If it's anything but green, there's potentially something wrong with it. under age ultrix tells us that 40% of our records in age aren't okay. And under favorite coffee. There are 20% null values having these anomalies in your data might not seem like a big deal, but it can easily throw your data set off. From an ETL or data ingestion point of view, not cleaning the data before ingesting it can corrupt the data set.

And as a user of it can skew the results of your queries. So cleaning the data set is very important before using it. Let's go to the preparation tab and drag in a data cleansing tool into our workflow. In the configuration pane on the left, we have several options for selecting which fields we want to clean and how to clean them. The fields in scope for this exercise where employee ID, first name, age, and favorite coffee. Under replace Knowles, let's replace null string values with an empty piece of string and replace no numeric fields with zero.

Just to quickly cover off nulls and blank values are two different things. means that nothing is stored in that field whilst blank means a blank value is stored in that field. Under remove unwanted characters, we can untick leading and trailing white spaces. Since we didn't have any, we can remove tabs line breaks and duplicate white spaces to fixed values that we had in age. Under all white spaces, we'll take that to fix employee ID, which had white spaces between the numbers. We can leave letters and numbers alone since we actually want those in our data set.

And we can tick punctuation to remove exclamation marks in first name. Let's run our workflow with Ctrl R. And we can see that our data set has been cleansed. All our columns show a green color and we can see that the whitespace has been removed from this record. First name has the punctuation is removed The tabs in age are gone and our null values and favorite coffee have been removed.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.