Handling Files with Pandas - TXT, CSV, JSON, XLSX

Python 3: Automating Your Job Tasks Superhero Level: Automate Data Analysis Tasks with Python 3
17 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$99.99
List Price:  $139.99
You save:  $40
€91.97
List Price:  €128.76
You save:  €36.79
£77.70
List Price:  £108.78
You save:  £31.08
CA$138.25
List Price:  CA$193.55
You save:  CA$55.30
A$152.72
List Price:  A$213.82
You save:  A$61.09
S$134.20
List Price:  S$187.89
You save:  S$53.68
HK$780.59
List Price:  HK$1,092.86
You save:  HK$312.27
CHF 88.44
List Price:  CHF 123.82
You save:  CHF 35.38
NOK kr1,100.24
List Price:  NOK kr1,540.39
You save:  NOK kr440.14
DKK kr687.33
List Price:  DKK kr962.29
You save:  DKK kr274.96
NZ$169.82
List Price:  NZ$237.75
You save:  NZ$67.93
د.إ367.26
List Price:  د.إ514.18
You save:  د.إ146.92
৳11,743.81
List Price:  ৳16,441.81
You save:  ৳4,697.99
₹8,371.41
List Price:  ₹11,720.31
You save:  ₹3,348.90
RM465.70
List Price:  RM652
You save:  RM186.30
₦165,740.53
List Price:  ₦232,043.38
You save:  ₦66,302.84
₨27,788.10
List Price:  ₨38,904.46
You save:  ₨11,116.35
฿3,584.57
List Price:  ฿5,018.55
You save:  ฿1,433.97
₺3,297.92
List Price:  ₺4,617.23
You save:  ₺1,319.30
B$564.64
List Price:  B$790.52
You save:  B$225.88
R1,827.40
List Price:  R2,558.44
You save:  R731.03
Лв180.11
List Price:  Лв252.16
You save:  Лв72.05
₩138,473.15
List Price:  ₩193,867.95
You save:  ₩55,394.80
₪365.84
List Price:  ₪512.19
You save:  ₪146.35
₱5,845.41
List Price:  ₱8,183.81
You save:  ₱2,338.40
¥15,371.96
List Price:  ¥21,521.36
You save:  ¥6,149.40
MX$1,849.41
List Price:  MX$2,589.25
You save:  MX$739.84
QR364.19
List Price:  QR509.88
You save:  QR145.69
P1,352.81
List Price:  P1,894
You save:  P541.18
KSh12,979.15
List Price:  KSh18,171.33
You save:  KSh5,192.18
E£4,833.51
List Price:  E£6,767.11
You save:  E£1,933.60
ብር5,772.08
List Price:  ብር8,081.15
You save:  ብር2,309.06
Kz87,254.87
List Price:  Kz122,160.31
You save:  Kz34,905.44
CLP$94,724.52
List Price:  CLP$132,618.12
You save:  CLP$37,893.60
CN¥724.96
List Price:  CN¥1,014.98
You save:  CN¥290.01
RD$5,909.73
List Price:  RD$8,273.86
You save:  RD$2,364.12
DA13,421.76
List Price:  DA18,791
You save:  DA5,369.24
FJ$225.87
List Price:  FJ$316.23
You save:  FJ$90.35
Q774.58
List Price:  Q1,084.45
You save:  Q309.86
GY$20,907.04
List Price:  GY$29,270.70
You save:  GY$8,363.65
ISK kr13,804.61
List Price:  ISK kr19,327.01
You save:  ISK kr5,522.40
DH983.39
List Price:  DH1,376.78
You save:  DH393.39
L1,772.18
List Price:  L2,481.12
You save:  L708.94
ден5,666.48
List Price:  ден7,933.30
You save:  ден2,266.82
MOP$803.68
List Price:  MOP$1,125.19
You save:  MOP$321.50
N$1,822.62
List Price:  N$2,551.75
You save:  N$729.12
C$3,674.96
List Price:  C$5,145.09
You save:  C$1,470.13
रु13,374.15
List Price:  रु18,724.35
You save:  रु5,350.19
S/375.27
List Price:  S/525.40
You save:  S/150.12
K391.68
List Price:  K548.37
You save:  K156.68
SAR375.13
List Price:  SAR525.20
You save:  SAR150.06
ZK2,608.33
List Price:  ZK3,651.77
You save:  ZK1,043.43
L458.02
List Price:  L641.25
You save:  L183.22
Kč2,333.92
List Price:  Kč3,267.59
You save:  Kč933.66
Ft36,019.89
List Price:  Ft50,429.29
You save:  Ft14,409.40
SEK kr1,081.31
List Price:  SEK kr1,513.87
You save:  SEK kr432.56
ARS$92,801.80
List Price:  ARS$129,926.23
You save:  ARS$37,124.43
Bs690.67
List Price:  Bs966.96
You save:  Bs276.29
COP$403,758.19
List Price:  COP$565,277.62
You save:  COP$161,519.43
₡52,788.48
List Price:  ₡73,905.99
You save:  ₡21,117.50
L2,471.95
List Price:  L3,460.83
You save:  L988.88
₲756,054.93
List Price:  ₲1,058,507.15
You save:  ₲302,452.21
$U4,029.23
List Price:  $U5,641.08
You save:  $U1,611.85
zł393.63
List Price:  zł551.10
You save:  zł157.46
Already have an account? Log In

Transcript

Hi, and welcome to this lecture. In this lecture, you're going to learn how to handle various file types like CSV, txt, or JSON or even Excel files, Excel spreadsheets, using pandas and pandas specific methods. In order to test the real power of the pandas module, I have created this sample folder right here called sample underscore data on my D drive, in which I have saved five files containing some employee data. And let's see the contents of these files real quick. Basically, what you should know about these files, which by the way you can find attached to this lecture is that they all contain the exact same information about 10 employees in the company, along with the column headers, as you can see, for example, in the txt file right here, so if we check the folder Once again, we have employees dot CSV, employees dot JSON dot txt dot XLS x and finally employees to dot txt and you're going to see why we have two dot txt files shortly.

For now, let's take these files one by one. First of all, as I said, we have employees dot txt. On the first row of this file, we have the column headers. And then we have 10 employees the data for 10 employees, where the data corresponding to each column is separated by comma from the rest of the data from the rest of the columns. By the way, notice that if you have a comma inside one of the values itself, then you need to add double quotes around the value so that the comma inside the string will not be regarded as the delimiter. Okay.

Next we have employees to dot txt, which contains the exact same information only this time instead of the comma We have this vertical bar as a delimiter. Next we have employees dot CSV, which is another type of file another extension for a file containing some data, we again have the comma as a delimiter. The only thing that differs from employees dot txt is the extension as I said, dot CSV, and we are going to see how to load read and handle this kind of file as well using pandas. Next, we have employees dot JSON, which is actually a list of dictionaries, where each dictionary is actually a row in the table. And the key value pairs inside the dictionary contain a key which is the column name and the value for that row, which is the value in this key value pair. Finally, we have employees dot XLS x, which is an Excel spreadsheet.

Again, the data is basically the same, only the file format de force Excel instead of txt or CSV. Okay, so we have these four file types, file extensions file formats that we will load and read using pandas in the rest of this video. So let's get to work. Let me minimize this. And I'm going to move this as well out of the way. And now we are left with the Jupiter notebook that we previously used in this section.

Of course, I have deleted all the sales from previous videos, and now we have an empty notebook. First of all, as always, the first thing we should do is to import pandas, Shift Enter. And now let's take care of the txt file. Actually the txt files because we have two txt files, one with a comma as the delimiter and one with the vertical bar as a delimiter. Now let's load our first txt file. So let's create a variable D. txt equals pandas dot and in order to load and Read txt and CSV files with pandas.

We are using the same method called read underscore CSV. So pandas dot read CSV, open and close parentheses, and inside the parentheses we should enter the path to the file to the text file that we want to load. In my case, that would be the colon, backslash backslash, sample underscore data, double backslash employees dot txt. Okay, Shift Enter. And now let's see the txt. Actually, we're going to see if the data inside the employees txt file has been properly loaded using pandas and Shift Enter.

And you can see right here that we have a much better looking set of data instead of having it in a txt or a CSV. file format. You can notice these are the column names that we previously had defined inside the txt file on the first row. And also pandas has added this indexing system from zero to nine for the rows in the table. And we are going to discuss more about indexing and slicing in an upcoming video. For now let's just focus on ways of loading data from external files into pandas into our Python applications.

Okay, now what if we want to load the second txt file, the one that had the vertical bar as the delimiter. In this case, we should specify that the comma is not the delimiter anymore. And for this, we will need to add a special argument and let's see it in action. So I'm going to copy this and paste it in a new cell. I'm going to name this D txt two pandas dot read CSV and we have employees To dot txt and after the path to the file, we insert a comma, and then the special argument called delimiter. equals, and now we define the delimiter, which is the vertical bar on the keyboard.

That's it, Shift Enter. Now let's see the txt two. And you can notice that the information in the table is basically identical, because the only difference between the two files was the delimiter. Okay, so we've seen how to read and load data from a txt file using pandas. Now, let's move over to loading CSV files. Basically, as I said earlier, we will need to use the same method from within the pandas module, namely read CSV.

So let me paste this once again. Instead of the txt, let's type in the CSV and we will have pandas dot read CSV, the path to the file employees dot c As v instead of txt, and now let's hit Enter, and also import the CSV, Shift Enter. And again, as you can see, the table is basically the same, the information is the same, only that this time we have loaded a CSV file instead of a txt file. Another thing we can do is to ignore the column names that we had in the file on the first row by removing the header. In order to do that, we just have to add another argument. So I'm going to copy this and paste it in a new cell.

And now I'm going to add this argument called Heather equals none. Okay, now, let me hit enter and also type in D, CSV, Shift Enter Now, and as you can see, at this point, the header was ignored, and is now treated as the first row in the table and the column headers have been switched to the default column headers, zero through seven. In our case, since we have eight columns, the rest of the data stays the same. Only that as I said, we have a new row in the table containing the former column headers. Okay, what else we can do, we can also ignore the header as we did just now and set alternative names for the columns. So let me copy this once again and paste it in a new cell.

We have header equals non comma. And now we have yet another argument called names, equals, and now we have a list here where each element will be the new name of each of the eight columns in our table. So in my case, let me copy and paste this. So instead of the initial column headers, we will have a B, C, D, E, F, G, and H. Okay, I'll hit enter and DCs V, Shift Enter To execute And as you can see, the column names have been changed, which is what we expected. But what if we want to add a prefix to each of the column names, for instance, we are going to set the header to none once again, so that we will have the default column labels zero through seven in our case, but we want to prefix each of these labels with let's say, string, co l from column.

So we will have C 00, l one and so on instead of 01 all the way through seven. So let's see how to do that. I'm going to copy this right here. So we have header equals none at this point, comma, prefix, which is yet another argument equals and now let's define the string. As I said, c o l from column now the CSV and Shift Enter, and this time we have called zero call one and so on. way up to code seven instead of the defaults, zero through seven.

Now let's get back to our initial table by simply reading the CSV file, the CSV, Shift Enter. Okay, so this is the initial table. At this point, the index of this table is this default zero through nine. And again, we will talk more about indexing very shortly in this section. Basically, this indexing system helps you reference some of the rows in the table by specifying the index itself, or a slice containing a start and stop index, but more on that later. Or if we want to reference the rows in this table by another column, not by this default zero through nine indexing system, then we can use the set index method to change the index of the table.

So we would have the CSV dot set underscore index and in between the parameter is an index and also in between double quotes, we will enter the name of the column that we want to be regarded as the table index as the new table index. So let's say we want column ID. And now let's also see the CSV. Once again, notice that we still have this indexing system zero through nine. So in order to modify the data frame itself, to make the change in place, we would have to use the same syntax as we did up here with a slight twist. So we would have this CSV dot set index of ID comma in place, equals true.

Okay, and now let's print out the CSV once again, Shift Enter. And now as you can see, the default index has disappeared, and the ID column is now the index of this table. On the same logic, we can yet again modify the index of this table. For instance, let's Say DCs v dot set underscore index of let's say the address column, comma in place, equals true. And now let's hit enter and the CSV, shift, Enter. And now you can see the address being the index of this table.

And now you can reference the rows inside the table by the value in the address column. Now, let's get back to our initial data frame. This one right here. So we have the CSV file loaded and the CSV. Let's see it once again. Okay, this is the initial table.

What else can we do? For example, we can find out the shape, which means the number of rows and columns of A table have a data frame. And to do that, we just have to enter the CSV dot shape, Shift Enter. And you can see we have 10 comma eight, which means 10 rows and eight columns. Okay, let's see the CSV once again. What else can we do for example, we can skip one or more lines at the start of the file when printing out the table.

And in order to do that, I'm going to use the skip rows argument. So we have skip rows equals four. And now let's see the CSV once again, Shift Enter. And you can notice that the first four rows have been skipped when printing out this table to the screen. On the other hand, we can also specify the number of rows in the file to read. And this is especially useful when reading pieces of large data files.

For example, let's say we want only the first five rows of the table to be shown. In this case, we will add one other argument called n rows equals five. Now let's see the CSV once again, Shift Enter. And again, we have only the five first row In the table printed out to the screen, Okay, enough about the CSV files. Now let's move over to JSON files. In order to load and read a JSON file into pandas, we will need to use the read JSON method.

So let me write this. Let's create another variable the JSON equals pandas dot read underscore, Jason. And of course in between the parentheses of this method, we will use the path to the file in between double quotes of course, let me paste this in and I will modify the extension only. So we have employees dot JSON. Okay, and now let's see the JSON, Shift Enter. And as you can notice, the information is basically the same as we had in the previous text file slash CSV file.

The last thing I will show you in this video is how to To load and read XLS X Files meaning Excel spreadsheets. For this, first of all, you will need to install an additional module that pandas needs in order to handle XLS X Files. And that module is called XLR D. So to install that module, make sure you're connected to the internet. And then in the windows command line, simply type in pip install x, l, or D. I have already installed this module, so I won't do it again. However, you should go ahead and do it right now. And then resume the video.

Assuming you did that. Now let's get back to our Jupiter notebook. And let's load an Excel file from our directory from the same directory sample underscore data using the read underscore Excel special method. So I'm going to write d x LS x equals pandas dot read underscore x And in between the parentheses of Excel and also in between double quotes, we need to enter the path to the file to the Excel spreadsheet. So I'm going to modify the extension here to load and read these employees dot XLS x file in my directory, so XLS x. Okay, let's hit Enter.

And let's also print out the XLS x. And we have successfully loaded and read the data inside the Excel file using yet another special method inside the pandas module. By the way, if you have multiple sheets inside your Excel file, then you can specify what sheet you want to load into this data frame. To specify the sheet you will need to use its index inside the Excel file. For example, the first sheet would have index zero, the second sheet would have index one and so on. For example, if I copy it paste this.

To specify the sheet, simply insert a comma and then sheet underscore name equals zero for the first sheet as I said, enter the XLS x, Shift Enter, and the file has been loaded successfully. Okay, I hope you enjoyed this lecture and I will see you in the next one to discuss reading and loading HTML content using pandas. See you there.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.