Practical Applications of Logistic Regression in SAS

Clinical Data Management Using SAS Categorical Data Analysis
15 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€64.99
List Price:  €92.85
You save:  €27.85
£55.77
List Price:  £79.68
You save:  £23.90
CA$95.68
List Price:  CA$136.70
You save:  CA$41.01
A$105.96
List Price:  A$151.38
You save:  A$45.42
S$94.48
List Price:  S$134.98
You save:  S$40.50
HK$546.78
List Price:  HK$781.15
You save:  HK$234.36
CHF 63.33
List Price:  CHF 90.48
You save:  CHF 27.14
NOK kr761.11
List Price:  NOK kr1,087.35
You save:  NOK kr326.23
DKK kr485.02
List Price:  DKK kr692.92
You save:  DKK kr207.89
NZ$116.44
List Price:  NZ$166.35
You save:  NZ$49.91
د.إ257.06
List Price:  د.إ367.25
You save:  د.إ110.18
৳7,660.01
List Price:  ৳10,943.35
You save:  ৳3,283.33
₹5,835.78
List Price:  ₹8,337.18
You save:  ₹2,501.40
RM331.75
List Price:  RM473.95
You save:  RM142.20
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,492.21
List Price:  ₨27,847.21
You save:  ₨8,355
฿2,572.74
List Price:  ฿3,675.50
You save:  ฿1,102.76
₺2,264.43
List Price:  ₺3,235.04
You save:  ₺970.61
B$355.02
List Price:  B$507.19
You save:  B$152.17
R1,295.44
List Price:  R1,850.71
You save:  R555.27
Лв127.05
List Price:  Лв181.51
You save:  Лв54.46
₩94,909.58
List Price:  ₩135,590.93
You save:  ₩40,681.35
₪260.15
List Price:  ₪371.66
You save:  ₪111.51
₱3,993.87
List Price:  ₱5,705.78
You save:  ₱1,711.90
¥10,713.01
List Price:  ¥15,304.96
You save:  ¥4,591.95
MX$1,187.73
List Price:  MX$1,696.83
You save:  MX$509.10
QR254.83
List Price:  QR364.06
You save:  QR109.23
P950.82
List Price:  P1,358.38
You save:  P407.55
KSh9,247.76
List Price:  KSh13,211.65
You save:  KSh3,963.89
E£3,352.12
List Price:  E£4,788.95
You save:  E£1,436.83
ብር4,006.43
List Price:  ብር5,723.72
You save:  ብር1,717.28
Kz58,511.64
List Price:  Kz83,591.64
You save:  Kz25,080
CLP$65,950.47
List Price:  CLP$94,219
You save:  CLP$28,268.52
CN¥506.53
List Price:  CN¥723.65
You save:  CN¥217.11
RD$4,055.76
List Price:  RD$5,794.19
You save:  RD$1,738.43
DA9,420.16
List Price:  DA13,457.95
You save:  DA4,037.79
FJ$157.70
List Price:  FJ$225.30
You save:  FJ$67.59
Q542.52
List Price:  Q775.06
You save:  Q232.54
GY$14,601.52
List Price:  GY$20,860.22
You save:  GY$6,258.69
ISK kr9,773.40
List Price:  ISK kr13,962.60
You save:  ISK kr4,189.20
DH703.98
List Price:  DH1,005.73
You save:  DH301.75
L1,236.34
List Price:  L1,766.28
You save:  L529.93
ден3,998.26
List Price:  ден5,712.05
You save:  ден1,713.78
MOP$561.77
List Price:  MOP$802.57
You save:  MOP$240.79
N$1,291.99
List Price:  N$1,845.78
You save:  N$553.78
C$2,569.36
List Price:  C$3,670.67
You save:  C$1,101.31
रु9,319.09
List Price:  रु13,313.56
You save:  रु3,994.46
S/260.54
List Price:  S/372.22
You save:  S/111.67
K269.79
List Price:  K385.44
You save:  K115.64
SAR262.50
List Price:  SAR375.01
You save:  SAR112.51
ZK1,882.68
List Price:  ZK2,689.66
You save:  ZK806.98
L323.40
List Price:  L462.03
You save:  L138.62
Kč1,628.77
List Price:  Kč2,326.92
You save:  Kč698.14
Ft25,334.28
List Price:  Ft36,193.38
You save:  Ft10,859.10
SEK kr755.02
List Price:  SEK kr1,078.64
You save:  SEK kr323.62
ARS$61,468.17
List Price:  ARS$87,815.44
You save:  ARS$26,347.26
Bs483.33
List Price:  Bs690.51
You save:  Bs207.17
COP$271,845.87
List Price:  COP$388,367.89
You save:  COP$116,522.02
₡35,672.25
List Price:  ₡50,962.55
You save:  ₡15,290.29
L1,724.16
List Price:  L2,463.20
You save:  L739.03
₲522,510.75
List Price:  ₲746,475.93
You save:  ₲223,965.17
$U2,674.97
List Price:  $U3,821.56
You save:  $U1,146.58
zł281.37
List Price:  zł401.98
You save:  zł120.60
Already have an account? Log In

Transcript

Welcome to clinical data management program using SAS. In this video, we will be discussing about the practical applications of knowledge to regression. So first let's discuss this case study that we are able to solve for close to regression in SAS, the different parameters of cancer are given depending on this parameters, we will decide whether the cancer is benign, that is not harmful or malignant. That is harmful when malignant means one and benign means zero. In large region as you know that we basically calculate the variable we calculate the probability for Y equals to one here may Vikas to one means the cancer is malignant and y equals to zero means the cancer is benign. That is it's not harmful.

So let's move to SAS to do it practically. So for this first, let's import the data set. So to import the data set, we'll be using proc import data file. Equals within double quotes, let's give the path where the data is there. So the details over here. So let me give the path from here.

This is me, Bob. And then let's give the data set name. The data set name is LR data dot CSV. So this is my data set name, I can take the data set name from here. Let's close the double quotes and then out equals a lot that is after importing the data set, the data set will be created inside work and the name of the data set will be replace router definitely we'll get replaced with SAS data set. And then before I run this code, let me explain your we have imported the data set a lot data this data set is given in this path.

So I've given the path I've used the procedure proc import data file out equal to elements after importing the data set. The data set name will be allowed in the SAS environment and then replace that is the audit trail is getting replaced with SAS data sets. And then run so let's run this code this is the large data set. So it has got around 516 observations my response variable is my dependent variable which is denoting y equals to one means valid that the cancer is malignant and by which to zero means it is benign and the rest are independent variable that is radius means textured mean parameter mean area means smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension mean radii radius c texture a C parameter c Area C smoothness a C compactness AC concavity AC and give points as you symmetry a C fractal dimension AC radius worst parameter was texture worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry Western factor levels universe These are our main dependent variables Now, let's After importing the data set have seen the data set now let's run the logistic procedure we'll be using the procedure called proc logistic data was a lot you have to give the keyword descending that is the the model dependent variable is the response variable.

Then I have to give all the independent variables name as I have many number of independent variables I'll just give the starting name of the independent variable that is radius mean. Then double dash mean starting from radius mental fractal dimension West then selection equals to step base use the keyword lactic and then Dre so let's first open this table see my starting variable is radius mean and maining variable is fractal dimension worst so accordingly I'm given before I run this code I'll explain. I've used the procedure proc logistic data equals to LR LR is present inside work so I did not give the work like renamed as rapidly as he for different descending and then model response variable equals he's a main response variable dependent variable these are independent variables from radius means to fractal dimension verse selections for stepwise stepwise selection will be done that is which of the variables will be used in a model that is a significant variables we'll be taking in our model and insignificant variables.

Which has lesser impact on non dependent variable or which are insignificant for model will be automatically discarded through stratify selection and lack fit we have done too, because we are building the model for Y equals to one and using this we'll be using poshmark Shamsher goodness of fit test to check whether our model is a good fit or not. So, let me run this code. So, this is the logistic procedures. So, response variable is main name of the variable is response number observations hi 16 extending. Now, to suffer selection procedures done over here, whether each in which of the variables are entered or given these variables are entered. Well, this is a summary of misstep by selection.

So, only these many variables are significant and they are entered in order to do the check the significance of each and every variable we do residue a chi square test, where my h not for the original chi square test is the model does not require any more variables. My non hypothesis model does not require any more variables and alternative is the model requires the V equals. So, here you see over here all of the variables is less than 0.05. So, we'll be accepting the learning purposes of these variables and significant ones which are required and this is a family that is maximum likelihood estimates. prestidigitation is always done using Emily because you know the curve is the sigmoid curve and it is calculating probability these are odds ratio estimates the total amount of percentage concordant in our data is entered into 7% which is very good that is very there is very much less amount of misclassification in our model.

Now, this is our Kashmir lecture Goodness of Fit Test when a p value is zero point 95 to the null hypothesis for the Ashmolean sugarless Hoffman Len shortness of breath test is h notice The model is a good fit and each one is a model is not a good fit. Since the p value is greater than zero point 95 two that means the model is a good fit. And that is I will accept the null hypothesis when when I like what is this the model is a good fit and the alternative hypothesis is the model is not a good fit. So here according to my data the model is a good now let's move to the next stock or the next step. Our next step will be showing the classification table for that we are going to use proc logistic data equals Ella descending model response equals we'll be giving the variable names These are my significant variables so we'll be giving the variable names from here first we will name is these are the variables which are important so first variable is concave points first, Next, we will name radius first.

Next variable name, extra reversed. Next radius se smoothness first compactness se and then concavity. So these are maybe evils which is significant. There are probably seven variables which are significant software that will be showing the C tables. So this keyword procedure will this classification Table A C table probability p prop p prop equals the probability value varies from zero One and the gap is by 0.01 that is a classification table at every level of probability from zero to one will be shown semicolon and then run so let's run this code and this could lead me explain you all the code we're doing proc logistic data equal to a lot men dataset name and descending model responses my dependent variable These are my independent variables which are significant. So I copied that from a reservoir that is concave cones was radius was texture was radius a smoothness was compactness se concavity reversed, see tables for calculation table.

Transmission table will be shown for probability from zero to one probability value lies between zero to one with a gap of 0.01. So let's run this code. See this is the classification table where the amount of therapy classes events in decorated resume nominees incorrectly classified events encrypted classified known events, the percentages, that is sensitivity specificity false positive, false negative all are given by sensitivities the total number of particles varied events by total number of observed events specificities currently classified not even a total number of observed known events false positives in country fluctuate events by total number of predicted events and false negative is incorrect reservation only with a total number of related moments. Now, these are many positives component which we got last time also lightning percent odds ratio estimates analysis of Emily estimates and we are building a model for Y equals to run. So we have seen a classification table.

Now let's predict the probability for Y equals to one so let's move to the next code. For this we'll be using the same procedure proper logistic data. So let's copy this code. We'll just modify this code a bit to predict the value because we are going to use the same independent variables Because as these are main significant variables so, in order to predict, we'll be using output out equals result that is a predicted value will be predicted in the data set called result and then people to predict it and then run let's run this code. So, for this unit not just check the result view, you have to check the data set where the created properties will be there, but this is my result data set. See the criteria priorities are over here is the estimated probability for Y equals to one this is my estimated probability for Y equals to one.

Now, our motive will be that we will be setting a benchmark so I have set a benchmark is point five that is it my estimated probability is greater than point five that is greater than 50% then it will be an event that is the cancer is malignant otherwise, it is cancer is winning. So Let's now form the status variable based on the estimated probability so we'll be using data pred underscore result set result create underscore result is a new data set that I'm creating set result is the original data set which was already created and said work both ways. That works on specifying any library name if my benchmark is if predicted, is greater than point five. It is 50%. Then status equals to one because we are building the model for Y equals to answer the greater probability for rifles to one is greater than 50%.

Then the cancer is malignant so status is equal to one is status is equals to zero. status is equals to zero, and then drop so if the greater private is greater than point faith, then status is One that is the cancer is managing otherwise the cancer is winning. So this manipulation will be created in inset fed underscore result data set which is a duplicate data set it's a copy of the original data set result which is already created inside work. Let's run this code. This is the Claire underscore result data set. So see here is the status variable.

So it is home based on the benchmark that it has to interpret is greater than config land is one otherwise it is zero. Now let's call the confusion matrix to check how, how much accurate is our model. For that we'll be using the procedure called proc freak. Data equals trade. underscore result. Table response stop status We do not want the representor Columbus in So no, no con semicolon and then Dre let's run this code.

So, see this is my confusion matrix where the correctly classified non events are 352 correctly classified events are 203 and the percentages This is three things and frequency percent so from here you have to basically calculate the accuracy will be 352 plus 203 divided by 569 200. So let's calculate the accuracy it is 352 plus 203 555 divided by total is 569 in 200 so many model is 97.5% accurate, so it is when the accuracy is quite high. From here you can also calculate sensitivity which is correctly classified events by total number of observed events specificities correctly classified known events a total number of observed known events false positive is in Korea classified evens by number of credited events and false negatives in country classified known events by total number of vacant known units. So, this is the concept of velocity equation or the practical part of the rice chicken vegetation.

So, let me end this video over here in May coming video we'll be discussing about Structured Query Language using sass. Thank you, goodbye. See you over for the next video.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.