Understanding different types of data

Machine Learning Using Python Statistics and Exploratory Data Analysis
5 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€59.73
List Price:  €85.34
You save:  €25.60
£51.80
List Price:  £74.01
You save:  £22.20
CA$96.84
List Price:  CA$138.36
You save:  CA$41.51
A$106.75
List Price:  A$152.51
You save:  A$45.75
S$89.95
List Price:  S$128.50
You save:  S$38.55
HK$545.64
List Price:  HK$779.53
You save:  HK$233.88
CHF 55.85
List Price:  CHF 79.79
You save:  CHF 23.94
NOK kr703.17
List Price:  NOK kr1,004.57
You save:  NOK kr301.40
DKK kr445.91
List Price:  DKK kr637.04
You save:  DKK kr191.13
NZ$118.73
List Price:  NZ$169.63
You save:  NZ$50.89
د.إ257.03
List Price:  د.إ367.21
You save:  د.إ110.17
৳8,495.52
List Price:  ৳12,136.98
You save:  ৳3,641.45
₹6,172.17
List Price:  ₹8,817.76
You save:  ₹2,645.59
RM295.67
List Price:  RM422.40
You save:  RM126.73
₦107,084.70
List Price:  ₦152,984.70
You save:  ₦45,900
₨19,808.40
List Price:  ₨28,298.93
You save:  ₨8,490.52
฿2,243.26
List Price:  ฿3,204.79
You save:  ฿961.53
₺2,883.76
List Price:  ₺4,119.83
You save:  ₺1,236.07
B$378.92
List Price:  B$541.35
You save:  B$162.42
R1,231.03
List Price:  R1,758.69
You save:  R527.66
Лв116.90
List Price:  Лв167.02
You save:  Лв50.11
₩97,041.13
List Price:  ₩138,636.13
You save:  ₩41,595
₪234.19
List Price:  ₪334.57
You save:  ₪100.38
₱3,968.43
List Price:  ₱5,669.43
You save:  ₱1,701
¥10,316.87
List Price:  ¥14,739.02
You save:  ¥4,422.15
MX$1,309.87
List Price:  MX$1,871.33
You save:  MX$561.45
QR255.14
List Price:  QR364.50
You save:  QR109.36
P938.51
List Price:  P1,340.79
You save:  P402.27
KSh9,032.87
List Price:  KSh12,904.65
You save:  KSh3,871.78
E£3,400.11
List Price:  E£4,857.51
You save:  E£1,457.40
ብር9,985.48
List Price:  ብር14,265.58
You save:  ብር4,280.10
Kz64,180.83
List Price:  Kz91,690.83
You save:  Kz27,510
CLP$67,522.15
List Price:  CLP$96,464.35
You save:  CLP$28,942.20
CN¥499.22
List Price:  CN¥713.21
You save:  CN¥213.98
RD$4,414.63
List Price:  RD$6,306.89
You save:  RD$1,892.25
DA9,089.11
List Price:  DA12,985
You save:  DA3,895.89
FJ$157.67
List Price:  FJ$225.25
You save:  FJ$67.58
Q535.38
List Price:  Q764.86
You save:  Q229.48
GY$14,604.61
List Price:  GY$20,864.62
You save:  GY$6,260.01
ISK kr8,549.27
List Price:  ISK kr12,213.77
You save:  ISK kr3,664.50
DH634.40
List Price:  DH906.33
You save:  DH271.92
L1,171.66
List Price:  L1,673.87
You save:  L502.21
ден3,674.28
List Price:  ден5,249.20
You save:  ден1,574.92
MOP$561.02
List Price:  MOP$801.49
You save:  MOP$240.47
N$1,234.02
List Price:  N$1,762.97
You save:  N$528.94
C$2,569.13
List Price:  C$3,670.35
You save:  C$1,101.21
रु9,858.12
List Price:  रु14,083.64
You save:  रु4,225.51
S/245.88
List Price:  S/351.28
You save:  S/105.39
K291.36
List Price:  K416.25
You save:  K124.88
SAR262.47
List Price:  SAR374.98
You save:  SAR112.50
ZK1,666.62
List Price:  ZK2,380.99
You save:  ZK714.36
L303.21
List Price:  L433.17
You save:  L129.96
Kč1,456.45
List Price:  Kč2,080.74
You save:  Kč624.28
Ft23,442.21
List Price:  Ft33,490.30
You save:  Ft10,048.09
SEK kr657.57
List Price:  SEK kr939.43
You save:  SEK kr281.85
ARS$95,110.21
List Price:  ARS$135,877.55
You save:  ARS$40,767.34
Bs482.36
List Price:  Bs689.11
You save:  Bs206.75
COP$278,383.76
List Price:  COP$397,708.14
You save:  COP$119,324.37
₡35,369.65
List Price:  ₡50,530.25
You save:  ₡15,160.59
L1,828.90
List Price:  L2,612.83
You save:  L783.92
₲504,728.30
List Price:  ₲721,071.34
You save:  ₲216,343.03
$U2,810.15
List Price:  $U4,014.67
You save:  $U1,204.52
zł253.83
List Price:  zł362.63
You save:  zł108.80
Already have an account? Log In

Transcript

Hello everyone, welcome to the course of machine learning with Python. In this video, we shall begin with a very interesting and important concept called exploratory data analysis. Every machine learning problem begins with a collection of data relevant to the problem followed by exploratory data analysis. In this short video, we will first explore what is called different types of data and variables. So let's get started. So, what is exclusivity Tennessee's exclusive data analysis in simple term is to summarize the data we have gathered in a meaningful way so as to get necessary insight about the data, the tools tricks and the rules to summarize, the collected data are all part of exploratory analysis.

Now why EDI is necessary. EDI is an important part of statistical analysis. This is also called descriptive statistics. It helps us in understanding and visualizing the collected data. Sometimes it is the end that's all we need. For example, let's say we want to summarize the results of Students in a school or college then only thing is we have to collect the data and do the exploratory data analysis that's all now what are called Data data are pieces of information about individuals organized into variables.

So by individual we mean particular person or object and by variable we mean particular characteristics of the individual. Let's take an example. So this is a data set okay, but as you can see, the individuals are averaged across rows, okay. So this is basically a data set from a hospital where the patient name or the patient identification number are actually stored in the first column, then the gender of the patient is stored in the second column followed by age, weight, height, smoking habit and race, okay. So usually variables are added to that plus column. So here the variables are gender, age, weight, height, smoking, heavy, today's etc.

Why the individuals or the patients here are stored across the rows. Okay, so if we want to retrieve the record of a particular patient let's say the patient number three then we go by these particular rules okay and identify this particular rule belongs to patient number three and we'll repeat the entire rule Okay, so this particular row denotes that the patient three has gender female it's 73 weighed 155 pounds height 59 inch smoking habit no and the reserve issue of no let's get deeper into the concept called variables. So variables can be classified into one of the following two types one is called the categorical and another is called quantitative. So what are the categorical variables, categorical variables take category or level values and place an individual into one of the several groups each observation can be placed only in one category and the categories are mutually exclusive.

Okay. Now what are called qualitative variables, qualitative variables, take numerical values and represent some kind of measurement. Okay, so this was Our previous example Can you identify what are two categorical variables and what are the quantitative variables. So, in our example gender and smoking habit are categorical variables okay. So, why gender is a categorical variable because it can take only two values male or female and each observation is either male or female. So, why smoking embodies a categorical variable because the patient can be either smoking or non smoking.

So, this also can take only two values. So, if a particular observation or particular variable can take only a few selected number of values, then it is called a categorical variable. Now note that we have included the smoking habit into one or two so one is no and two minutes Yes, though these are numbers these does not have any arithmetic significance. That means we cannot add one plus two and produce another number three or we cannot take the average of this column. So these numbers has no arithmetic significance. These are just four coded of categorical variable.

Now each weight and height are quantitative Because it can take continuous values within a certain range. Similarly, weight and height. What about race? race is also a categorical variable. Okay, good. Now we took a random sample from 2000.

US Census here is a part of the data set. So let's test our understanding on the quantitative under categorical variables. So this is the US Census. As you can see, now it has four columns. One is state, the zip code, the family size, and when can you identify what are the individuals described by these data? So it could be either states or people living in the United States in the year 2000 people with the families in the year 2000.

So it is nothing but the people living in the United States in the year 2000. So these individuals are nothing but the people in the United States living in the year 2000. Okay, so this is just the first six rows of the data. So the data is very long file because the US census is actually completed by the people living in the United States right now. What they Video is the code. Now, from this data, it will look like the zip code is basically quantitative in nature, but the zip code has no arithmetic significance.

That means we cannot sum all the zip code and produce the average fine zip code basically specifies a geographic location. So, that is y z code is not a quantitative variable it is a categorical variable fine. Now, what type of variable is annual income. Now, the annual income as described by this fourth column is basically a quantitative variable because it assumes continuous value within a certain range. And we can also do arithmetic operations on annual income. Let's see we can produce average annual income his annual income is a quantitative variable.

Okay, so in the next video we will examine the distribution of the quantitative and categorical variable. So see you in the next lecture. Thank you.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.