Data frame in pandas

Python Programming Introduction to pandas
13 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€64.99
List Price:  €92.84
You save:  €27.85
£55.80
List Price:  £79.72
You save:  £23.91
CA$95.74
List Price:  CA$136.78
You save:  CA$41.04
A$105.97
List Price:  A$151.39
You save:  A$45.42
S$94.71
List Price:  S$135.31
You save:  S$40.59
HK$547.38
List Price:  HK$782
You save:  HK$234.62
CHF 63.50
List Price:  CHF 90.72
You save:  CHF 27.21
NOK kr760.18
List Price:  NOK kr1,086.02
You save:  NOK kr325.83
DKK kr484.74
List Price:  DKK kr692.51
You save:  DKK kr207.77
NZ$116.49
List Price:  NZ$166.43
You save:  NZ$49.93
د.إ257.06
List Price:  د.إ367.25
You save:  د.إ110.18
৳7,679.15
List Price:  ৳10,970.69
You save:  ৳3,291.53
₹5,844.24
List Price:  ₹8,349.28
You save:  ₹2,505.03
RM331.61
List Price:  RM473.75
You save:  RM142.14
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,466.20
List Price:  ₨27,810.05
You save:  ₨8,343.85
฿2,579.91
List Price:  ฿3,685.75
You save:  ฿1,105.83
₺2,258.19
List Price:  ₺3,226.13
You save:  ₺967.93
B$355.28
List Price:  B$507.56
You save:  B$152.28
R1,291.06
List Price:  R1,844.45
You save:  R553.39
Лв127.20
List Price:  Лв181.73
You save:  Лв54.52
₩95,163.27
List Price:  ₩135,953.36
You save:  ₩40,790.08
₪260.34
List Price:  ₪371.93
You save:  ₪111.59
₱4,006.12
List Price:  ₱5,723.27
You save:  ₱1,717.15
¥10,811.89
List Price:  ¥15,446.23
You save:  ¥4,634.33
MX$1,180.56
List Price:  MX$1,686.59
You save:  MX$506.03
QR255.22
List Price:  QR364.61
You save:  QR109.39
P950.05
List Price:  P1,357.27
You save:  P407.22
KSh9,308.67
List Price:  KSh13,298.67
You save:  KSh3,990
E£3,339.92
List Price:  E£4,771.52
You save:  E£1,431.60
ብር4,017.22
List Price:  ብር5,739.13
You save:  ብር1,721.91
Kz58,559.69
List Price:  Kz83,660.29
You save:  Kz25,100.60
CLP$65,083
List Price:  CLP$92,979.70
You save:  CLP$27,896.70
CN¥496.09
List Price:  CN¥708.73
You save:  CN¥212.64
RD$4,059.13
List Price:  RD$5,799
You save:  RD$1,739.87
DA9,404.13
List Price:  DA13,435.05
You save:  DA4,030.92
FJ$157.14
List Price:  FJ$224.49
You save:  FJ$67.35
Q543.86
List Price:  Q776.98
You save:  Q233.11
GY$14,638.84
List Price:  GY$20,913.53
You save:  GY$6,274.68
ISK kr9,768.50
List Price:  ISK kr13,955.60
You save:  ISK kr4,187.10
DH701.39
List Price:  DH1,002.03
You save:  DH300.64
L1,239.86
List Price:  L1,771.31
You save:  L531.44
ден4,006.46
List Price:  ден5,723.76
You save:  ден1,717.29
MOP$563.24
List Price:  MOP$804.66
You save:  MOP$241.42
N$1,288.69
List Price:  N$1,841.06
You save:  N$552.37
C$2,575.52
List Price:  C$3,679.48
You save:  C$1,103.95
रु9,351.66
List Price:  रु13,360.08
You save:  रु4,008.42
S/260.79
List Price:  S/372.58
You save:  S/111.78
K270.67
List Price:  K386.69
You save:  K116.01
SAR262.50
List Price:  SAR375.01
You save:  SAR112.51
ZK1,901.47
List Price:  ZK2,716.50
You save:  ZK815.03
L323.43
List Price:  L462.07
You save:  L138.63
Kč1,625.98
List Price:  Kč2,322.93
You save:  Kč696.95
Ft25,280.20
List Price:  Ft36,116.11
You save:  Ft10,835.91
SEK kr759.20
List Price:  SEK kr1,084.61
You save:  SEK kr325.41
ARS$61,608.98
List Price:  ARS$88,016.60
You save:  ARS$26,407.62
Bs483.53
List Price:  Bs690.78
You save:  Bs207.25
COP$273,394.26
List Price:  COP$390,579.97
You save:  COP$117,185.71
₡35,797.53
List Price:  ₡51,141.53
You save:  ₡15,343.99
L1,728.55
List Price:  L2,469.47
You save:  L740.91
₲523,227.64
List Price:  ₲747,500.10
You save:  ₲224,272.45
$U2,673.53
List Price:  $U3,819.50
You save:  $U1,145.96
zł280.44
List Price:  zł400.65
You save:  zł120.20
Already have an account? Log In

Transcript

In this lesson, we'll look at the basic operation with data frame and series data. So let's start with data from data as in NumPy, we can create areas autonomously, even in NumPy, we can create a data frame. Let's see how we can do it. So let's first create a Python dictionary with two keys and the corresponding three values for each of them. So first, before creating it, I'll call the pandas package. Now here I'll be creating the dictionary.

So, the I see which is equal to within this curly bracket, I have to write H and then colon. So these are values 20 comma 25 comma 30. Then I'll say this give a comma, again, right height. Then I'll take this gift colon and within this write the values. So run this now, to turn this dictionary into a data frame, we have to write data. Let's say underscore one is equal to PT dot data frame, PD dot data frame.

Then within this first bracket, write the name of the dictionary. So take this the mission MDI see now in the next line, this print this data underscore one, so run this, see. So in this way, we got our data frame using pandas. Now, let's see If we use this info function, what we have used in our previous lesson, what we'll get so write data, underscore one dot info. Okay, now run this. So this is a way to independently create a data frame converting it from a dictionary.

So at this point, let's import the data of the Titanic passenger. And then let's print it. So, to import that one, when use data is equal to the NPT dot read underscore CSV and here I have to write the location. So where is my data set present, this is copy this location from here and I paste it over here. So change this slash to a forward slash I'm here, right? Titanic dot CSV noorani.

And let's go on the data set data dot hit. So this is the data set. First, let's analyze the age column. So you're first trying to analyze the age column. So to analyze the each column and right here, data within this square bracket and right within quotes right age and run it. So this program takes all the values of the H column.

Now however, I want to point out something that is for the first five values, see the values 20 to 3026 3535. And then there is an n a n. So n, n is not a number. We'll find it very often when we are going to do this kind of analysis on the data. So at this point, let's understand Why do our non numerical data is present in this line file, which is in this index file. So let's go back to the chart first. This one, this one is my data.

Now as you can see here that there is no value in this place, there is an empty box. So as I mentioned in the previous lesson, when we use the info function to see the information of the chart, and we see columns that have values less than the total sample, it means that those columns have no values. So let's write it over here once again. And write data dot info and run it. See the age current there is 714 values. So this is the number of values present and the rest are missing.

Now, having said that, let's assign the line to H and let's write H which is equal Do data within this square bracket and write each and I print the first seven values. So colon 10 seven, run it like this. So here is our result. Now, I want to ask you for a moment of attention, when you take a single row or column of a data frame, it is classified as the CDs data time series data, like all data types have functions that are similar, but not the same as those of the data frame. Let me show you an example. So for CDs, we can have an index such as write H dot index.

And then we can have also values such as write H dot values. So as you see we printed here just the four, seven values all This was to show you two functions that have both the CDC data and the data frame. Now if we print each writing as follows like this, we see that it is a series kind of data that said, let's now print our data frame again. So let's write data hit Andhra. Okay, so we have said that the first column on the left is our index card, but when you print to individual column like the each column in this case, at this point, we can really find it useful. So what we need to do is take a more significant column and turn it to the index of our data frame.

Let's try for an example turning the name column into our index. So let's write data equal to data dot set underscore index. And here within this first bracket, right name. And let's print Now see, so we can see the name column move to the first position and it has become the chart index. In order to do this, it is fundamental to remember the set index function. Literally this function said that the index we want for the chart and we really need it in specific case like this.

Let's now get back toward each country. So data with a square bracket, right each and here, again a square bracket, if column and right fine. So in this way by doing, we have here on the name of people because the name column is now our index column, and of course, the range. Once we have this idea, we can also get the age of a specific person that interests us, just by reassigning the age column to a name writing as follows. So let's write here. H is equal to data within this write age.

And in the next line, write H within this square bracket selector name. Suppose I have to select the first name Hear this one so just select copy it and write it here and run it. So here it is, we see that the program printed the age of the person we indicated it printed correctly that's 22. See, now when we have a column of numbers only we can use for to do mathematical operations. So let's first print current values of each writing as each like this. So once we have done this, let's now add two to all the values.

So right each equal to each plus two, and now percentage. See, see the values is increased by two We can also now use multiplication. For example, let's copy the whole program and paste it over here. And just change instead of plus two and write star and write 1.3. Run it. In this case, all the values have been increased by 30%.

Of course, we also have the possibility of subtracting or dividing in the same way by copying again, the same line and using the symbol we already know. So I'll leave it to you. You can practice it, practice it by your own. So now, as far as mathematical operation are concerned, we can also calculate the average values in the same exact way that we have dated in NumPy. So let's just write each dot mean and run it Here it is. So there are also older seniors function like the minimum like H dot mean and run it.

See, simple we can do H dot Max, it will extract the maximum age. See. Now, this should be added and that if we don't want to do this calculation calculations, a column or all at a time, we can use a function of data frame described, which makes us able to do a series of mathematical operation altogether. So, let's just write data dot describe and run it. From here we can see how this function actually perform. A series of calculation.

So first, we can see that columns for which the program perform the operations are those with numbers. So let's go verify it in the chart. and here we can see that columns that don't have numbers are automatically filtered as we can see here. Now once we have seen, let's list which mathematical operation we can perform with this function. So the first operation is count, which basically count the number of valid values. In our cases, we see 891 for all except the age column, that as we have seen, has empty values.

So if only as 714 then we can calculate the average values after that, the standard deviation and the maximum and the minimum and finally a quarter to quarter and three quarters. So with this function, we can already have a clear statistical basis of the data we have. So we have Come to the end of our lesson. So here we have learned how to use data frame in pandas and how to use different functions of data frame. So I'll be ending my video for you. Keep practicing.

See you in the next video. Thank you

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.