Python 3: Automating Your Job Tasks Superhero Level: Automate Data Analysis Tasks with Python 3

By: Mihai Catalin Teodosiu

18 minutes

Share the link to this page

Copied

Facebook

Twitter

Add the class to your calendar

Add to Google Calendar

Add to Apple Calendar

Add to Yahoo Calendar

Add to Outlook Calendar

Completed

You need to have access to the item to view this lesson.

One-time Fee

$99.99

List Price: $139.99

You save: $40

€92.10

List Price: €128.94

You save: €36.84

£77.67

List Price: £108.74

You save: £31.07

CA$138.12

List Price: CA$193.37

You save: CA$55.25

A$152.38

List Price: A$213.34

You save: A$60.96

S$134.35

List Price: S$188.10

You save: S$53.74

HK$780.56

List Price: HK$1,092.81

You save: HK$312.25

CHF 88.35

List Price: CHF 123.69

You save: CHF 35.34

NOK kr1,098.66

List Price: NOK kr1,538.17

You save: NOK kr439.51

DKK kr687.32

List Price: DKK kr962.28

You save: DKK kr274.95

NZ$169.58

List Price: NZ$237.42

You save: NZ$67.84

د.إ367.26

List Price: د.إ514.18

You save: د.إ146.91

৳11,746.62

List Price: ৳16,445.74

You save: ৳4,699.11

₹8,371.76

List Price: ₹11,720.80

You save: ₹3,349.03

RM465.70

List Price: RM652

You save: RM186.30

₦159,584.04

List Price: ₦223,424.04

You save: ₦63,840

₨27,830.93

List Price: ₨38,964.42

You save: ₨11,133.48

฿3,601.18

List Price: ฿5,041.80

You save: ฿1,440.62

₺3,299.80

List Price: ₺4,619.86

You save: ₺1,320.05

B$564.45

List Price: B$790.25

You save: B$225.80

R1,825.21

List Price: R2,555.37

You save: R730.15

Лв180.14

List Price: Лв252.21

You save: Лв72.06

₩138,584.75

List Price: ₩194,024.19

You save: ₩55,439.44

₪365.84

List Price: ₪512.19

You save: ₪146.35

₱5,843.06

List Price: ₱8,180.52

You save: ₱2,337.46

¥15,455.24

List Price: ¥21,637.96

You save: ¥6,182.71

MX$1,831.18

List Price: MX$2,563.73

You save: MX$732.54

QR364.64

List Price: QR510.52

You save: QR145.87

P1,361.94

List Price: P1,906.78

You save: P544.83

KSh13,073.69

List Price: KSh18,303.69

You save: KSh5,230

E£4,833.36

List Price: E£6,766.90

You save: E£1,933.54

ብር5,779.16

List Price: ብር8,091.05

You save: ብር2,311.89

Kz87,254.87

List Price: Kz122,160.31

You save: Kz34,905.44

CLP$94,704.52

List Price: CLP$132,590.12

You save: CLP$37,885.60

CN¥725.11

List Price: CN¥1,015.19

You save: CN¥290.07

RD$5,921.37

List Price: RD$8,290.16

You save: RD$2,368.78

DA13,436.79

List Price: DA18,812.04

You save: DA5,375.25

FJ$225.82

List Price: FJ$316.16

You save: FJ$90.33

Q775.39

List Price: Q1,085.58

You save: Q310.18

GY$20,916.55

List Price: GY$29,284.01

You save: GY$8,367.45

ISK kr13,788.62

List Price: ISK kr19,304.62

You save: ISK kr5,516

DH983.22

List Price: DH1,376.54

You save: DH393.32

L1,774.76

List Price: L2,484.74

You save: L709.97

ден5,676.16

List Price: ден7,946.86

You save: ден2,270.69

MOP$803.68

List Price: MOP$1,125.18

You save: MOP$321.50

N$1,847.94

List Price: N$2,587.19

You save: N$739.25

C$3,679.41

List Price: C$5,151.32

You save: C$1,471.91

रु13,388.03

List Price: रु18,743.78

You save: रु5,355.75

S/374.11

List Price: S/523.77

You save: S/149.65

K386.64

List Price: K541.31

You save: K154.67

SAR375.13

List Price: SAR525.19

You save: SAR150.06

ZK2,610.12

List Price: ZK3,654.27

You save: ZK1,044.15

L457.84

List Price: L641

You save: L183.15

Kč2,336.96

List Price: Kč3,271.83

You save: Kč934.87

Ft35,994.10

List Price: Ft50,393.18

You save: Ft14,399.08

SEK kr1,079.53

List Price: SEK kr1,511.39

You save: SEK kr431.85

ARS$92,940.28

List Price: ARS$130,120.11

You save: ARS$37,179.83

Bs690.73

List Price: Bs967.06

You save: Bs276.32

COP$404,134.11

List Price: COP$565,803.93

You save: COP$161,669.81

₡52,857.56

List Price: ₡74,002.70

You save: ₡21,145.14

L2,475.36

List Price: L3,465.60

You save: L990.24

₲758,175.48

List Price: ₲1,061,476.01

You save: ₲303,300.52

$U4,025.98

List Price: $U5,636.54

You save: $U1,610.55

zł393.60

List Price: zł551.06

You save: zł157.45

Already have an account? Log In

Transcript

Hi, and welcome to this final video of the data analysis section. During this video, we are going to analyze a Python application line by line. And I strongly suggest focusing your efforts and attention and writing each line of code yourself by pausing the video when needed. Also, please keep in mind that in the next lecture, the one following this video, you can download the full code of this application, as well as all the additional resources I'm going to use. Now since I'm going to use sample txt, CSV, JSON and Excel files. To test this application, you can go ahead and download and save them to a folder on your computer and use them for your upcoming application tests just so we can be on the same page.

One more thing in this video I'm going to use the Postgres SQL database we previously built in the database automation section of the course. Read some data from and also write a new table to. So if you skip that section, pause the video now and go through the videos and resources in the database automation section to fully understand what we're about to do here. Now assuming that you've gone through that section and have the Postgres SQL database available, I'm going to carry on with the rest of this video. Next, let me briefly describe what is the role of the application you're going to write until the end of this video, let's assume that you are the owner of a company and this company has grown in time from two employees to over 15. And as time has gone by, you haven't felt the need for a human resources team that would have kept the employee data organized and updated into a central database.

Instead, you have your employee data scattered across various files, like txt, CSV, JSON and Excel, and you even have some of the data in an old book. Greet SQL database table. Now you wants to clean up this mess and join all of these pieces of data into a new centralized SQL table containing all your current employees. Moreover, after solving this task, you also want to query the new database table and build a summary report containing basic statistics about your team and save that report in a nice clean HTML table that you can then perhaps upload to your management internet web page. Now before diving into the application code, let me show you each of the pieces of information that we need to join and write to the central database. As you can see, I have all of these files in a separate folder I created on my D drive.

Again, you can download these files as well as the code from the lecture following this video. Go ahead and do that now. Save them to a folder of your choice, and then resume this video. Okay, assuming you did that, let me know are each of these files and show you the contents. So as I mentioned earlier, we have data regarding various employees scattered across these four files, these three files right here and the Excel spreadsheet, and also in an old Postgres SQL database table. So I'm going to open up the SQL, and let's connect to the existing database, the one from the database automation section, and then see the contents of the table.

So this is my instance of the SQL right here. Let me increase the font. Okay, so first of all, let's connect to the staff database. So backslash c staff, and the semi colon. Okay, we are now connected. Now let's see the available schemas and make sure our old schema is still there.

So backslash d n. And indeed We have my staff. Great. Now let's check if the old table containing some employees record is still available here. So backslash dt, my staff, the name of the schema.as, an asterisk. And indeed We have the employees table right here. Finally, let's see its contents by performing a simple SQL query.

So select all from my staff, dot employees. Okay, so inside this SQL table, we have six employees registered. Then we have three more employees registered in the txt file right here, two more in the CSV file. And finally, two more in the JSON file. Of course, let's not forget about these three employees which are registered in the Excel spreadsheet. So that would be a total of 16 employees distributed across four file types and an SQL table.

Good. We will join all 16 of them into a single new table inside the same Postgres SQL database, and now it's time to see the code of this application. First of all, as always, we need to import the necessary modules that will help us achieve our goals. And we have the pandas module and the SQL alchemy module. SQL alchemy, helps us interact with an SQL database, Postgres SQL, in our case, from within a panda's method. So for instance, down here, we are able to read an SQL table from the Postgres SQL database using the read SQL table method right after connecting to the database.

Therefore, we need to install SQL alchemy, as we will do with any other Python module. So let's open up the windows command line and type in pip install SQL alchemy. I've already done this, but you should pause the video and do it before moving on. By the way, you will find a link to the official documentation SQL alchemy attached to this lecture to find out more about this library if you need to. I chose to import only the Create engine method from within the SQL alchemy module, and not the entire module since this is the only method we need for our application. Next, we need to create five data frames for the five containers where our scatter data is stored.

Of course, you've already learned how to do that earlier in this section. So in short, this is the code for the txt and CSV files. We use the read CSV method from within the pandas module, and the name of the file as an argument. Next, for the JSON file, we use the read Jason method. And similarly for the Excel spreadsheet, we use the read Excel method to load the file as a panda's object. Finally, the fifth place where we have employee data stored is the old Postgres SQL database.

Table I showed you earlier called employees. In order to connect to the database and read the information in the table, we first have to create an engine with the Create engine method. I'm going to assign this instance to a variable called simply engine. In between the parentheses of the Create engine method, we have to input a string. As a side note, please notice this link inside the comments that will provide more information on how to create and configure an engine. For now let's analyze the string inside the parentheses.

So this string right here, first we have Postgres SQL. In the terminology of the SQL alchemy module. This is called the dialect, meaning the flavor of the SQL database. This can be SQL lite, my SQL, Oracle, MS SQL, or in our case, Postgres SQL, obviously. Next, after the plus sign, we have what's called the driver name. The name of the database API to be used to connect to the database in all lowercase letters.

If the driver name is not specified, then a default driver will be imported if available. In our case, that would be the psycho PG two driver we used in the database automation section of the course. However, for the past Gree SQL dialect, this is also the default driver name that SQL alchemy picks. So I could skip this, but I wanted you to know this piece of information as well. Next, we have a colon and two forward slashes followed by the username Mihai, and the password which is Python separated by a colon, these being the username and password used to connect to the database. Then we have the Add sign right here, followed by the IP address or hostname of the database server.

In my case, this is the localhost or 127 dot zero dot Zero dot one. And finally after the port number, we also have to specify the name of the database to connect to. Okay, this is pretty straightforward, I think. Next on this line right here, we are creating a new object called D SQL, that will load the information from the table we specify as its first argument. So from the employees table, then the second argument is the engine itself. As you can see right here, the connection to the database.

And last but not least, we have the name of the schema that the employees table belongs to. All of this is accomplished using the read SQL table method. Now, let me explain the next line right here. So this one, if we go back to our table to our old table, called employees, we notice that the column names are all in lowercase letters, while for example, the column names in the Excel file or in the txt file, these ones right here are all capitalized or even in camel case, this difference in column names will generate a conflict. Therefore an exception when we are going to join the data from all five sources into the final SQL table, trust me on that it will happen if you forget to rename the columns of this data frame, as I'm doing right here in the application. To do that pandas provides the Rename methods.

This one in between the parentheses of rename the first argument is a dictionary. As you can see right here, each element of the dictionary being a key value pair, where the key is the old column name, and the value is the new column name. And we do this for all the columns in the table. Then we have two additional arguments, axes equals columns, which specifies the fact that we are in a In the column labels of the table, and then we have in place equals true, which modifies the data frame in place, performing the design changes on the object itself. The next thing to do, as you can see right here is to write the data from all five data frames to a new Postgres SQL table. This is easily done using the to SQL method from within the pandas module.

So for each data frame, we are using the name of the object, a dot, and then the to SQL method. And then in between its parentheses, we have the following arguments, the name of the new table to be created in the database, I called it all staff, then the name of the object representing the engine. So engine, then we have the name of the schema where this table should be created. In my case, schema is my step. Next we have index equals false. This setting makes sure that we Keep the column labels as they are in the data frame and not write the data frame default index as a column.

Now for the first set of data being written to the table, we don't need the if exists equals append argument. Since the table does not yet exist, we are creating it right here on this line when using the to SQL method for the first time in the application, for the next calls of to SQL, we need to specify the exists argument and set it to append because at this point, the table has already been created. And the only thing we want is to append the data from each of the remaining four sources to the data already written by the first to SQL call. Next, it's time to extract some relevant data from the newly created SQL table to build the final report. To do that, we're going to use a special method from within pandas called Read SQL query. As its name implies, we can use this method and the engine we created earlier to perform various SQL queries on our table.

Of course, at this point, your imagination and also your actual needs come into play. I have included some basic SQL queries to extract various statistics about the employees in the company. Also, for some of the data are used either the eye lock, as you can see right here, or the lock methods to either extract a specific value using indexes, or to get a value using labels or label based indexing. Of course, to use label based indexing, I needed to previously set the department as a permanent index for the table using the set index method to be able to reference a value by the department name. So my employees that include the total number of employees and departments, the number of employees per department And then using the max min and min methods right here. I'm also including the highest and lowest salary in the company, as well as the average salary across the company.

I'm not going to analyze each of these lines of code in detail since they are pretty straightforward. And the SQL query syntax is beyond the scope of this course. It's up to you to create your own queries according to your needs, and then use the lock or ILOG methods to extract the values you need. Okay, the next thing I did is I created a list of lists right here, where each enter list is going to be a row in the table inside the final report. The first value inside each of these lists, this string is going to sit inside the first column of the table, and then the second value. This one right here will be an entry in the second column of the final report.

Notice that each of Second values in this list is a variable from above total employees, log EMP, salary, high salary, low salary, a VG and so on, converted to an integer so that we can calculate the minimum maximum and average salary. You're going to see this very soon. Now having this list of lists, we need to create a final data frame, as you can see on this line right here, by passing two important arguments, the name of the variable containing the list of lists, so that would be summary and the column names for the two columns. Otherwise, we will end up with the default column headers named zero and one, which is not so eyecatching inside the report. Once this final data frame has been created, the only thing left to do is to write it to an HTML format. To do that, we first have to create the file itself on the local file system.

And I'm doing that inside the application by opening the file inside The specified folder using the W file access mode, which creates the file if it doesn't yet exist. Finally, using the to HTML method from within pandas, I am writing the contents of the data frame to the newly created file, also overwriting the default index setting from true to false, as you can see right here, and configuring the text inside each table cell to be centered. Now the big moment, it's time to test the application. So let's check that there's no table named all staff in the database. I'm going to use select all from my staff, dot all staff, and indeed this table does not exist. Also, let's check the folder on my D drive to see that there is no HTML file in there.

This is the folder no HTML reports inside this folder. And now let's run The application so I'm going to open up the windows command line. And I'm going to type in Python, D, colon backslash. And now the name of my application pandas app dot P, y, enter. Okay, no errors. Now let's first check the database once again to see if the new table has been created, and that it contains all 16 employees gathered from the five sources.

So I'm going to open up SQL once again and perform the same query. And this time, this is the table we see 16 rows. Great. Finally, let's see the HTML report file. I'm going back to this folder. And now you can see that this summary dot html file has been created.

Let's open it using the Firefox browser for instance, open with Firefox, and there is our pretty little table. nicely formatted and containing important statistics about our employees obtained As a result of collecting employee data scattered across text files, CSV files, JSON files, Excel spreadsheets, and possibly SQL tables. Okay, I hope you enjoyed this section and I will see you in the upcoming updates to this course. Bye

Python 3: Automating Your Job Tasks

APPLICATION - Reading and Writing Data in PostgreSQL Databases Using Pandas

Transcript

Sign Up

Sign Up

Share