Dealing with Files containing Data using R

Learning R through an Example Learning R through an Example - Part 1
16 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€65.14
List Price:  €93.07
You save:  €27.92
£55.73
List Price:  £79.62
You save:  £23.88
CA$95.61
List Price:  CA$136.60
You save:  CA$40.98
A$106.30
List Price:  A$151.87
You save:  A$45.56
S$94.64
List Price:  S$135.20
You save:  S$40.56
HK$546.91
List Price:  HK$781.33
You save:  HK$234.42
CHF 63.50
List Price:  CHF 90.72
You save:  CHF 27.21
NOK kr764.69
List Price:  NOK kr1,092.46
You save:  NOK kr327.77
DKK kr485.92
List Price:  DKK kr694.20
You save:  DKK kr208.28
NZ$117
List Price:  NZ$167.15
You save:  NZ$50.15
د.إ257.06
List Price:  د.إ367.25
You save:  د.إ110.18
৳7,661.98
List Price:  ৳10,946.16
You save:  ৳3,284.17
₹5,839.65
List Price:  ₹8,342.71
You save:  ₹2,503.06
RM331.75
List Price:  RM473.95
You save:  RM142.20
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,492.21
List Price:  ₨27,847.21
You save:  ₨8,355
฿2,575.56
List Price:  ฿3,679.53
You save:  ฿1,103.97
₺2,262.43
List Price:  ₺3,232.18
You save:  ₺969.75
B$357.76
List Price:  B$511.10
You save:  B$153.34
R1,296.01
List Price:  R1,851.52
You save:  R555.51
Лв127.38
List Price:  Лв181.98
You save:  Лв54.60
₩95,113.23
List Price:  ₩135,881.87
You save:  ₩40,768.63
₪260.11
List Price:  ₪371.60
You save:  ₪111.49
₱3,999.61
List Price:  ₱5,713.97
You save:  ₱1,714.36
¥10,715.43
List Price:  ¥15,308.41
You save:  ¥4,592.98
MX$1,185.45
List Price:  MX$1,693.57
You save:  MX$508.12
QR254.79
List Price:  QR364.01
You save:  QR109.21
P955.69
List Price:  P1,365.33
You save:  P409.64
KSh9,427.65
List Price:  KSh13,468.65
You save:  KSh4,041
E£3,355.67
List Price:  E£4,794.02
You save:  E£1,438.35
ብር3,989.43
List Price:  ብር5,699.43
You save:  ብር1,710
Kz58,616.62
List Price:  Kz83,741.62
You save:  Kz25,125
CLP$66,326.02
List Price:  CLP$94,755.52
You save:  CLP$28,429.50
CN¥506.51
List Price:  CN¥723.62
You save:  CN¥217.11
RD$4,049.59
List Price:  RD$5,785.38
You save:  RD$1,735.78
DA9,420.19
List Price:  DA13,457.99
You save:  DA4,037.80
FJ$157.70
List Price:  FJ$225.30
You save:  FJ$67.59
Q542.62
List Price:  Q775.21
You save:  Q232.58
GY$14,613.08
List Price:  GY$20,876.73
You save:  GY$6,263.64
ISK kr9,792.30
List Price:  ISK kr13,989.60
You save:  ISK kr4,197.30
DH706.05
List Price:  DH1,008.69
You save:  DH302.63
L1,239.86
List Price:  L1,771.31
You save:  L531.44
ден4,010.92
List Price:  ден5,730.13
You save:  ден1,719.21
MOP$562.15
List Price:  MOP$803.11
You save:  MOP$240.95
N$1,302.54
List Price:  N$1,860.85
You save:  N$558.31
C$2,571.43
List Price:  C$3,673.63
You save:  C$1,102.20
रु9,317.58
List Price:  रु13,311.40
You save:  रु3,993.82
S/262.81
List Price:  S/375.46
You save:  S/112.65
K268.53
List Price:  K383.63
You save:  K115.10
SAR262.51
List Price:  SAR375.03
You save:  SAR112.52
ZK1,879.71
List Price:  ZK2,685.42
You save:  ZK805.70
L324.19
List Price:  L463.14
You save:  L138.95
Kč1,629.65
List Price:  Kč2,328.17
You save:  Kč698.52
Ft25,373.17
List Price:  Ft36,248.95
You save:  Ft10,875.77
SEK kr758.75
List Price:  SEK kr1,083.98
You save:  SEK kr325.22
ARS$61,468.94
List Price:  ARS$87,816.53
You save:  ARS$26,347.59
Bs482.36
List Price:  Bs689.12
You save:  Bs206.75
COP$272,946.91
List Price:  COP$389,940.87
You save:  COP$116,993.96
₡35,623.88
List Price:  ₡50,893.45
You save:  ₡15,269.56
L1,732.95
List Price:  L2,475.75
You save:  L742.80
₲523,151.84
List Price:  ₲747,391.81
You save:  ₲224,239.96
$U2,683.09
List Price:  $U3,833.15
You save:  $U1,150.06
zł281.85
List Price:  zł402.67
You save:  zł120.81
Already have an account? Log In

Transcript

The data we will utilize in this program is the prices of commodities I have been tracking for a very long time now. Now, I use this data as I have played copyright on this and I can use it in this program without having to deal with any issues of validity and ownership of this data. The data should be sufficient to discuss the different aspects of art that we have planned in the for the program, you can get, you can try all the different concepts we will discuss using this data and you can get hold of your own data and apply these concepts to them, which you can get it from yourself or from your organization. First let us look at the data. So, I opened the file for you. The data you can see is a CSV file, comma separated values file.

There are these attributes observation date for tattoo on Ian price tomato price go In pries, Sensex index and the Nifty Index The first thing we have to do is to read the data into our our environment. Let us see how to do it. Now, to read the data we have to give the command read dot CSV. So, first thing we have to do here is to give the file name. So, within quotes give the complete path followed by the file name. So, that is rise index as of 23rd of April zero falls two three dot CSV.

Now, this data has got a header So, we say header is equal to true and this data is separated by comma So, he says separator is equal to comma okay This data has to be stored into a variable, the variable we store it is a data frame. So for this purpose, we said df is a variable and store this data in that variable. It does run this command. When we run this command, it executes successfully, so we don't have to worry. Then I see what the data looks like. So we said df and run this.

So now you see there's a lot of the all the information is shown there's a lot of information. There. Let's see the top few rows and the last few years to see the top few rows with a head here. We run this now you can see the first six rows of this data. Similarly to see the last few we can say tail, pf. And we can run this shows the last few data points in this data set.

Okay now we need to know the structure of this data to see the structure of the data we say str df. When you run this it shows the structure of the data as you can see, observation is a factor which is strings right now though it's dates it's in string format right now, but I know prize only and prize, tomato price, gold price, Sensex and Nifty Index these are all numbers. Okay, now let us see how we can see individual rows and columns of data. First, we see a number of rows, we can say n rows df, it shows the number of rows that we have, so we have 1939 rows. Similarly, we can see the number of columns, we can say n call, df, and it shows the number of columns of data that we have. Now if so we have seven columns of data.

Suppose we want to only see a few columns of data we can say df followed by bracket the first parameter is a row number. next parameter is the column number. So we say comma one. So, we will see the first column we run it and so we see all the dates on the the date column is displayed here. Okay. We can also see multiple columns at the same point of time, we can say df followed by now the first parameter is the row number then we give a vector within the vector we say one comma four comma five comma six.

So we are going to see four columns of data right now. So we say this so we get four volumes of data. We as this is difficult we can say head for this and we can only see the first few columns. first few rows. So there you are using the observation data tomato gold and Sensex, we can see particular rows also we can say, we show me the number 10 2031 and 45 and 65. So we can see individual particular roles which are doing, analyze.

So this way we can get exactly the number of the row numbers that we want to see, don't forget to put the comma up to the row number otherwise it will us in taxes. Now, there is another way to see a particular column, we can say df dollar, dollar, and then give the field name. So, I say nifty. And now let's run this. So, you see all the values of the column nifty is displayed. This is also very useful in This can be used in many other contexts as well which you will see very shortly.

Now, we can find data for a particular condition as well. So, let us say we want to find the data or date on which nifty was at the maximum. So, we can say df dollar recently nifty and then we give the condition. So, condition is which we say which and here we said df dollar nifty is equal to w equal to is the equation equator operator equal to max off df dollar nifty df dollar nifty okay. So, now let us run this okay so it says the nifty was maximum value was the 111 Three 0.4 but we want to see the date right. So, we said df.ob is date and then give the condition.

Now, when we run this, we see it was on the maximum on 29th of January 2018. Now, what if you want to see all the columns for that particular date, then we can use a subset command, we can subset. Now the data frame which we're using is df subset df and then we give the condition we just copy and paste the condition that we have given before. So df.or nifty is equal to maximum TF dot o nifty. Now let us run this. So when we run this, we get this out PC the entire tupple where this data is found.

Now, a little look at the data once again, you say head dear If we run this you'll notice that the price of potato onion and tomato is the same in all the all this data what you're seeing this is because I started recording this data only from first of January 2018. Now, the remaining data I have updated with the mean value what I have found for the observation dates. So, now, let us see on me on the one the data from first of January So, we can say df subset df df dollar OBS date greater than or equal to 2018, hyphens 01, hyphens 01 first January 2018. Now, when we run this, let us see what happens it gives an error that is it is not able to convert is not able to form this equation it says because the data is stored as a string, it is not a date.

So, let us now convert this column in into a date column do this we need to include the library I pasted the code for you already we have to use the package file Do you print it? So, that does include the package load the package called up to date you run this code now the package Liberty data is included. So now we say df dollar copious date we will try to update this columns data type df dot o p as date assign it. Now we say as date and to convert it to date we need the function as dot date. So as the date we need to supply that we want df dollar OBS date to be converted. And the existing data if you notice it's in the format why by year, month and date.

So we say percentage capital why this you can see in the syntax book function hyphen percentage m rightful percentage d. So, now let us run this. So, it does not give any errors data see whether the data actually got converted or not we say head df. Now you see in our data the data is available, it will see the structure to see whether the data type actually got changed or not. So, we say structure df now you'll notice that it is up to date format. Okay, now we can run our subset command once again. So, now we see subset df OBS date is greater than or equal to we convert that string also into a date as date of this 2001 year, followed by percentage.

Capital Y hyphen, percentage m hyphen, percentage D need to be one more bracket Okay. Now let us run this. So, when we run this, let us see the data there is a lot of data. So, you notice that these are all in 2018 okay now just see head and then see the initial data that will give a complete proof of this. So, we say head of this and we run this. Now, you see this from first of January 2018.

So, we have converted this column from a string type to our data now, we can perform data operations on this particular column. We will see those very shortly. Now, we can update data in particular columns of a data frame. Let us see how to do that, like I told you I have recorded data, protect audio And tomato prices from first of January 2018. However, we have this data in the data frame from first of January 2013. So, let us update it we said first bf potato and we do not want to update all the data.

So, we only want to update the data from before first of January 2018. So, we say which and we copy this condition paste it here, okay and we make it less than So, all that for all the rows where the date is less than first of January 2018, we want to update the data and we set the value to Na Na stands for not available. So we now we can run this command Okay, now this has gone through. Now let us very check this. So let us see the data for the potato prices. So we said Do we give a head so that you don't see a lot of data?

So, we say head first 20 rows in on it you see all the data has been changed to any we can do the same thing for the tomato and only enterprises also. So we copy this and we change this to tomato and we again copy this and change and change this date to on yen Okay, we can run these two commands. Now it is done. So we can say show me the first 20 rows and it dissolves in we can also add a new column to the data frame. Let us say we want to add the month for That observation date. So, to do that, we we can we will say df and we want to add a new column called OBS month.

So, we say df dot OBS month and we assign this to the month this month function is available in the loop to date package, month of df dollar OBS date observation date. So, now when we run this it has gone through now let us check the structure of our data. So we say str df. Now you see we have an extra column called OBS month and it is a numeric column where the month is stored. Let us see the first 20 rows and then we can see that there is a column called OBS month and it has got the month of the date. We can say Taylor of this, and we can check the last few rows as well.

And we notice that the month number is stored there. We summary summarize what we have seen so far, we saw how to read data from a CSV file. We saw how to view data view particular columns of data, one or more of them. We also saw how to view data for particular rows, one or more of them. And then we saw how to view data for a given condition. We also learned how to subset data.

We introduced you to the lubra date package we saw we told you how to handle dates. We saw how to convert the data type of a column from a string to a date. Then we saw how to update data in particular columns. And also we saw how to add a new column to the data frame. We will conclude the part one of this program here. Please join us for the part To where we'll cover the remaining topics that is dealing with databases, creating graphs and creating a presentation.

Please join us for that session. We thank you for watching our video. I hope you enjoyed it as much we hope we enjoyed creating it for you. Please join us for part two where we continue this program and look out for other programs that we will be launching in the future. Thank you once again and goodbye

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.