SAS Analytics Logistic Regression- Case Study & Practical

By: Hands-On System

13 minutes

Share the link to this page

Copied

Facebook

Twitter

Add the class to your calendar

Add to Google Calendar

Add to Apple Calendar

Add to Yahoo Calendar

Add to Outlook Calendar

Completed

You need to have access to the item to view this lesson.

One-time Fee

$69.99

List Price: $99.99

You save: $30

€59.75

List Price: €85.37

You save: €25.61

£52.31

List Price: £74.74

You save: £22.42

CA$96.55

List Price: CA$137.94

You save: CA$41.38

A$105.75

List Price: A$151.08

You save: A$45.33

S$90.51

List Price: S$129.31

You save: S$38.79

HK$544.63

List Price: HK$778.07

You save: HK$233.44

CHF 55.68

List Price: CHF 79.54

You save: CHF 23.86

NOK kr709.60

List Price: NOK kr1,013.76

You save: NOK kr304.16

DKK kr446.54

List Price: DKK kr637.94

You save: DKK kr191.40

NZ$121.57

List Price: NZ$173.68

You save: NZ$52.11

د.إ257.03

List Price: د.إ367.21

You save: د.إ110.17

৳8,560.40

List Price: ৳12,229.67

You save: ৳3,669.27

₹6,269.51

List Price: ₹8,956.83

You save: ₹2,687.32

RM285.34

List Price: RM407.65

You save: RM122.31

₦102,147.79

List Price: ₦145,931.67

You save: ₦43,783.88

₨19,610.91

List Price: ₨28,016.79

You save: ₨8,405.87

฿2,203.37

List Price: ฿3,147.80

You save: ฿944.43

₺2,991.77

List Price: ₺4,274.14

You save: ₺1,282.37

B$387.96

List Price: B$554.26

You save: B$166.29

R1,174.36

List Price: R1,677.73

You save: R503.37

Лв116.89

List Price: Лв166.99

You save: Лв50.10

₩103,288.44

List Price: ₩147,561.24

You save: ₩44,272.80

₪224.44

List Price: ₪320.64

You save: ₪96.20

₱4,100.11

List Price: ₱5,857.56

You save: ₱1,757.44

¥11,041.27

List Price: ¥15,773.93

You save: ¥4,732.65

MX$1,262.17

List Price: MX$1,803.18

You save: MX$541.01

QR254.83

List Price: QR364.06

You save: QR109.23

P924.59

List Price: P1,320.90

You save: P396.31

KSh9,022.40

List Price: KSh12,889.70

You save: KSh3,867.29

E£3,330.82

List Price: E£4,758.52

You save: E£1,427.70

ብር10,873.84

List Price: ብር15,534.72

You save: ብር4,660.88

Kz64,180.83

List Price: Kz91,690.83

You save: Kz27,510

CLP$63,735.69

List Price: CLP$91,054.89

You save: CLP$27,319.20

CN¥492.80

List Price: CN¥704.03

You save: CN¥211.23

RD$4,384.49

List Price: RD$6,263.83

You save: RD$1,879.33

DA9,099.19

List Price: DA12,999.40

You save: DA3,900.21

FJ$159.83

List Price: FJ$228.34

You save: FJ$68.51

Q536.34

List Price: Q766.24

You save: Q229.89

GY$14,644.08

List Price: GY$20,921.01

You save: GY$6,276.93

ISK kr8,792.84

List Price: ISK kr12,561.74

You save: ISK kr3,768.90

DH641.59

List Price: DH916.59

You save: DH275

L1,184.99

List Price: L1,692.92

You save: L507.92

ден3,678.35

List Price: ден5,255.01

You save: ден1,576.66

MOP$561

List Price: MOP$801.47

You save: MOP$240.46

N$1,174.20

List Price: N$1,677.51

You save: N$503.30

C$2,568.63

List Price: C$3,669.63

You save: C$1,101

रु10,100.17

List Price: रु14,429.44

You save: रु4,329.26

S/235.71

List Price: S/336.74

You save: S/101.03

K297.75

List Price: K425.38

You save: K127.62

SAR262.50

List Price: SAR375.02

You save: SAR112.51

ZK1,583.62

List Price: ZK2,262.41

You save: ZK678.79

L303.43

List Price: L433.49

You save: L130.06

Kč1,454.38

List Price: Kč2,077.78

You save: Kč623.39

Ft23,109.99

List Price: Ft33,015.69

You save: Ft9,905.70

SEK kr648.68

List Price: SEK kr926.73

You save: SEK kr278.04

ARS$101,545.57

List Price: ARS$145,071.33

You save: ARS$43,525.75

Bs483.69

List Price: Bs691.02

You save: Bs207.32

COP$270,176.04

List Price: COP$385,982.31

You save: COP$115,806.27

₡34,957.68

List Price: ₡49,941.69

You save: ₡14,984

L1,843.98

List Price: L2,634.38

You save: L790.39

₲469,590.22

List Price: ₲670,871.93

You save: ₲201,281.70

$U2,743.53

List Price: $U3,919.50

You save: $U1,175.96

zł251.36

List Price: zł359.11

You save: zł107.74

Already have an account? Log In

Transcript

We were doing the practical session of logistic regression in SAS software we were using a data set related to credit risk analytics where we were building a model for Y equals to one that is a dependent variable is calculating probability for Y equals to one or probability of an event which denotes that is in our model collaborative i equals to event denotes that the customer will not be a loan default and y equals to zero denotes that is the probability of a non event customer will be a loan default. So probability Bibles to this customer will not be a loan department therefore loan can be given to the customer and probability by equals to zero his customer will be a loan default that is loan cannot be given to the customer. So we have done stepwise selection on our model to select the significant independent variables that can be used for the further action Since we have selected we have obtained 14 significant independent variables out of the 30 independent variables that are there in the data set.

The significant independent variables that are selected are check account duration history, save account, new card education character meal, single other instrument installment amount used car for and rent. So, these significant variables that are included in the model they are included with the help of residual chi square test that is which of the independent variables should be included in our model is decided by a suitable chi square test where they each notice now hypothesis the model does not require any more variables and each one is the model requires more variables. We have also done hashman lecture Goodness of Fit Test where we have got the p value greater than 0.05 that is we have accepted the null hypothesis for our model to the null hypothesis for our for hospital MC Goodness of Fit Test is h notice the model is a good treat and each one is models not a good fit so far our model the model is a good fit.

We have also obtained the table for odds ratio estimates the table for analysis of maximum likelihood estimates that is maximum likelihood estimation technique is used to estimate the parameters. We have also been the table for percentage of concordance discordance and tight pairs, the higher the percentage of concordance better is the model because less is a miss classification for the model we have also generated the classification table for every level of probability from zero to one with a gap of 0.01. And we have got different measures of classification table at every level of probability from zero to one with a gap of 0.01. That is, we got the correctly classified events the correctly classified non events the incorrectly classified events the incorrectly classified non events, we have got the total percentage correctly classified the sensitivity and specificity the false positive for the false negative for a classification table for every level of priority from zero to one with a gap of 0.01.

We have created a data set called result where we have predicted the probability for Y equals to one that is for each of the customer we have predicted the probability for Y equals to event that is probability of giving the loan to the customer. So this fake depravity is displayed in a data set called result. So this is my result data set where I've got the estimated probability for each of each and every customer poll I applied for the loan. This is estimated probability for an event that is probability of giving the loan to each of the customer, then we have set up a cutoff probability level in order to convert this estimated probability to a binary variable taking values zero or one saying that if the estimated probability or if the created probability is greater than the cutoff probability level, which we have set as point five, then the value of status is equal to one orders this value of status is equal to zero.

So the status variable is created in another data set called pred result, which is a duplicate data set of results. So the status variable is created beside the variable of estimated probabilities, so for every value of probability there is a value of status, saying that the estimated probability is greater than point five which is the cutoff priority level. The status value is one otherwise it is zero. When we say that Estimated probability is greater than point five that means that the estimated probability for giving room to the particular customer is greater than point five, then we say the status is one that means that the customer who has the estimated probability greater than point five in there is more chance that loan will be given to that customer. Therefore, the status is one because one is an event until a customer has got estimated probability less than point five that is less than the cutoff level that we have decided over here as point A we can change the current average level according to our choice if the estimated revenue is less than point five then we say the status is zero that is there is less chance that the customer will let me get a low.

Here one denotes an event and zero denotes a non event. Now, we are going to measure the accuracy of a model that we have built. In order to measure the accuracy of the model we will be doing up we will be generating a confusion matrix for our model. So in this video, we will be generating the confusion matrix for our model. To measure the accuracy of the model and also to measure the different measures of confusion matrix, so let's start we are going to use the procedure called proc freak. It's a frequency procedure, roughly beta equals to Fred underscore result.

This is the duplicate data set, which was already created inside work containing the relative gravity and the status variable. That is a binary variable taking value 01. This confusion matrix will be bound between the observed values and the predicted values. So in my pred underscore result, my observed values is the response variable and my predicted values is the status variable. So I'm going to execute the frequency procedure on the data set read underscore result between response and the status variable measure the accuracy of the model ROC freak data equal to print underscore reserved. Then we only use the keyword table to specify Variables responses may observe to be able containing observed value we are doing cross frequency table therefore, we are using start showing cross frequency status is the predicted variable both are binary variables response and status that is taking values 01 we do not want row percent in column percent which comes by default in frequency here we have written or we have used the keyword noro local then here running you're using the run statement.

So, before I execute the code Let me explain the code we have used. The frequency procedure on our data set read underscore result which is containing the observed variable response and status variable that is we have set up a cutoff probability level and converted estimated probability to a status variable. So, If the estimated revenue is greater than point five then one that is in status equals to one otherwise status equals to zero and this response here is one because we are building a model for Y equals to one to view and form the confusion matrix between response and status. So, we have specified that in the table statement response dot status that is across frequency table, we have written neuro local because we are only concerned with the frequency, we do not mean the row percentage column percent. So we have used the key words no row and no call and then run.

So let's run this code. So see this is my confusion matrix. Here response 01 is the observed value is zero and observed value as one this is predicted value status is my predicted variable where I've converted the estimator private into status that is mean predicted variable which is also taking values 01 Also by new variable. So, this 155 145 7628 these are frequency, the first value for every sale is frequency and the next value is frequency present. The 00 155 and 55 means there are cases where the observed value is zero and the predicted value is zero. So, there are 55 such cases Okay.

Next, there are 145 cases where observed value is zero and the Patriot values one there are 72 such cases where the observed value is one and the predicted value is zero and there are 620 such cases where the observed value is one and the predicted value is one. So, my total number of observations is thousand. Now, we will be using this confusion matrix to measure the accuracy of the model and to calculate the different measures different other measures or confusion matrix so that we have to do it manually. So, I have taken this confusion matrix in an Excel file. This is the same confusion matrix which I shown you in the recipe word. So, my response variable is observed variable my status variable is the predicted variable the different measures of confusion matrix that we're going to measure is first is accuracy of the model that is percentage correctly classified that is equal to correctly classified events plus correctly classify eight known events by total number of observations.

So correctly classified into hundred if you do you can convert into percentage that is correctly classified events but plus political estimate non events by total number of observations in 200. Here correctly plus vt lenses one one lead 628 cases where it is observed also as one as well as expected also as one 620 plus 155 in 206 28 plus 150, a Beethoven number observation 6200. So, he Haven't converted into percentage I've just done the total that is one plus 620 divided by thousand that is around zero point 73 which is around 17.3%. So, the accuracy is good. Next is correctly classified evens that is 628 This is also called true positive, there are 628 cases correctly classified non events there is 00 there are 150 cases it is also called True negative. Next is incorrectly classified events.

This is 01 this case that is observed a zero and predicted as one is also called false positive indirect interest rate evens and incorrectly classified nonhumans is one zero that is 72 cases there in the literacy rate non humans are also called false negative. Next is sensitivity sensitivity is also called true positive rate. That is it is the ratio of correctly classified events by total observed events. So, correctly classified even says 628 by total observed events, total observed events this 628 by 700 This is my total observed events this is total observed events and this is the correct classmate events sensitivities ratio apparently was it was the total opposite events that is 28 by seven 700. specificity is correctly classified known events by total observed known events. So, correctly classified non events is 155 155 by total observed non events is 300 that is 51% that is 0.51 false positive rate is the ratio of incorrectly classified events by total predicted events.

So, incorrectly classified events is 145 by total predicted events. So, my peg to events is this 3773 cannot be 33 and false negative rate is incorrect classified non events by total predicted known events. So, in Korea classified known events are 145. So, encrypted password non events a 72 that is 72 by total credit non you insist to 27 that is 72 by 227 that is around 31% that is 0.3. So, these are the different measures of confusion matrix that we have to calculate once we get the confusion matrix in in SAS we will be getting to the confusion matrix and then we in order to calculate the accuracy of the model we have to calculate these measures since the accuracy of the model 73% which is quite good. Now, we will be learning this we will be learning to hear in this video.

So, before I move to the next video, let me recap the concepts that we have done in this in this videos we have done step by selection to select the significant variables we have created, generated classification table for every level of private from zero to one with a cap of 0.01. We have predicted probability for Y equals to one we have set up the cutoff probability level converted the predicted probability to a binary variable named status and we have formed the confusion matrix between the observed variable and the predicted variable we call the odds ratio estimates table we got the analysis of maximum likelihood estimates table we have used residual chi square test for selecting significant variables. We also did Hashmi lm shortness of test which has concluded that our model is a good fit. Now in an upcoming video we will be doing time series analysis the practical session or regression is over in our upcoming video.

We'll be starting this time series analysis Goodbye. Thank you see you for the next video.

SAS Analytics

Logistic Regression Practical part- 5

Transcript

Sign Up

Sign Up

Share