Understanding the concepts of Logistic Regression part - 2

Clinical Data Management Using SAS Categorical Data Analysis
9 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€59.73
List Price:  €85.34
You save:  €25.60
£51.80
List Price:  £74.01
You save:  £22.20
CA$96.84
List Price:  CA$138.36
You save:  CA$41.51
A$106.75
List Price:  A$152.51
You save:  A$45.75
S$89.95
List Price:  S$128.50
You save:  S$38.55
HK$545.64
List Price:  HK$779.53
You save:  HK$233.88
CHF 55.85
List Price:  CHF 79.79
You save:  CHF 23.94
NOK kr703.17
List Price:  NOK kr1,004.57
You save:  NOK kr301.40
DKK kr445.91
List Price:  DKK kr637.04
You save:  DKK kr191.13
NZ$118.73
List Price:  NZ$169.63
You save:  NZ$50.89
د.إ257.03
List Price:  د.إ367.21
You save:  د.إ110.17
৳8,495.52
List Price:  ৳12,136.98
You save:  ৳3,641.45
₹6,172.17
List Price:  ₹8,817.76
You save:  ₹2,645.59
RM295.67
List Price:  RM422.40
You save:  RM126.73
₦107,084.70
List Price:  ₦152,984.70
You save:  ₦45,900
₨19,808.40
List Price:  ₨28,298.93
You save:  ₨8,490.52
฿2,243.26
List Price:  ฿3,204.79
You save:  ฿961.53
₺2,883.76
List Price:  ₺4,119.83
You save:  ₺1,236.07
B$378.92
List Price:  B$541.35
You save:  B$162.42
R1,231.03
List Price:  R1,758.69
You save:  R527.66
Лв116.90
List Price:  Лв167.02
You save:  Лв50.11
₩97,041.13
List Price:  ₩138,636.13
You save:  ₩41,595
₪234.19
List Price:  ₪334.57
You save:  ₪100.38
₱3,968.43
List Price:  ₱5,669.43
You save:  ₱1,701
¥10,316.87
List Price:  ¥14,739.02
You save:  ¥4,422.15
MX$1,309.87
List Price:  MX$1,871.33
You save:  MX$561.45
QR255.14
List Price:  QR364.50
You save:  QR109.36
P938.51
List Price:  P1,340.79
You save:  P402.27
KSh9,032.87
List Price:  KSh12,904.65
You save:  KSh3,871.78
E£3,400.11
List Price:  E£4,857.51
You save:  E£1,457.40
ብር9,985.48
List Price:  ብር14,265.58
You save:  ብር4,280.10
Kz64,180.83
List Price:  Kz91,690.83
You save:  Kz27,510
CLP$67,522.15
List Price:  CLP$96,464.35
You save:  CLP$28,942.20
CN¥499.22
List Price:  CN¥713.21
You save:  CN¥213.98
RD$4,414.63
List Price:  RD$6,306.89
You save:  RD$1,892.25
DA9,089.11
List Price:  DA12,985
You save:  DA3,895.89
FJ$157.67
List Price:  FJ$225.25
You save:  FJ$67.58
Q535.38
List Price:  Q764.86
You save:  Q229.48
GY$14,604.61
List Price:  GY$20,864.62
You save:  GY$6,260.01
ISK kr8,549.27
List Price:  ISK kr12,213.77
You save:  ISK kr3,664.50
DH634.40
List Price:  DH906.33
You save:  DH271.92
L1,171.66
List Price:  L1,673.87
You save:  L502.21
ден3,674.28
List Price:  ден5,249.20
You save:  ден1,574.92
MOP$561.02
List Price:  MOP$801.49
You save:  MOP$240.47
N$1,234.02
List Price:  N$1,762.97
You save:  N$528.94
C$2,569.13
List Price:  C$3,670.35
You save:  C$1,101.21
रु9,858.12
List Price:  रु14,083.64
You save:  रु4,225.51
S/245.88
List Price:  S/351.28
You save:  S/105.39
K291.36
List Price:  K416.25
You save:  K124.88
SAR262.47
List Price:  SAR374.98
You save:  SAR112.50
ZK1,666.62
List Price:  ZK2,380.99
You save:  ZK714.36
L303.21
List Price:  L433.17
You save:  L129.96
Kč1,456.45
List Price:  Kč2,080.74
You save:  Kč624.28
Ft23,442.21
List Price:  Ft33,490.30
You save:  Ft10,048.09
SEK kr657.57
List Price:  SEK kr939.43
You save:  SEK kr281.85
ARS$95,110.21
List Price:  ARS$135,877.55
You save:  ARS$40,767.34
Bs482.36
List Price:  Bs689.11
You save:  Bs206.75
COP$278,383.76
List Price:  COP$397,708.14
You save:  COP$119,324.37
₡35,369.65
List Price:  ₡50,530.25
You save:  ₡15,160.59
L1,828.90
List Price:  L2,612.83
You save:  L783.92
₲504,728.30
List Price:  ₲721,071.34
You save:  ₲216,343.03
$U2,810.15
List Price:  $U4,014.67
You save:  $U1,204.52
zł253.83
List Price:  zł362.63
You save:  zł108.80
Already have an account? Log In

Transcript

Welcome to clinical data management program using SAS. In this video we will be discussing about the further concepts of logistic regression we had last discussed about where it started with the concept of concurrent person discordant pairs we will discuss about the further concepts under concurrent phase and discard phase and then we'll move to the other concepts of logistic regression. So as you all know the concordant pairs of cases where my probability of an events are greater than probability of non event discordant pairs of cases where probability of an event is less than probability of non event and types of cases where probability of an event is equal to probability of unknown event. Now let's understand the steps to calculate concordance of discordance. First let's calculate first step is calculate the predicted probability last year regression model second divided it into two data sets one dataset contains observations having actual value of dependent variable with value one that is even and corresponding property values that is corresponding to property values.

The other data set contains observations having actual value of dependent variable that is zero which is non event against their trade priorities close next step is compare each created value in first data set with each record value in second data set, the total number of pairs that we'll get is x into y because it will be Cartesian product x is the number of observations in the first data set with actual values of one in the dependent variable. Y the number of observations in the second data set with actual values is zero in the dependent variable. In this check, we're performing the Cartesian product across journal events and non events. For example, if you have 100 events and thousand non events, it would create hundreds of thousand pairs for comparison, the next step is a periscope recorded a one observation with the desired outcome that is even has a higher probability than zero observation without outcome there is no new way.

So a pair is kept concurrent when the observation when the probability for an event is greater than greater probability of unknown A pair is discarded when the fatal probability of an event is less than the break depravity often non event so, in concordant the predicted probability of an event should be greater than the practice probability of a non event in case of discordant first break the probability of a non event should be greater than the probability of an event and then for tight is tight pair is the pair where the predicted probability of an event is equal to the probability of a non event Now the final percent values are calculated using the formula below first percentage component is number of component pairs by total number of face percentage discordance number of discordant pairs that number of pairs percentage type number of typeface but we'll never have this area under the curve we'll see statistics is equal to percent component plus 0.5 into percentage type.

So these are the steps you need to follow to calculate the concordance that is the total amount of concordance of discordance in your mode. Next is the most important thing is setting the cut point for every level. Basically, in case of logistic regression, the dependent variable which we are calculate thing that dependent variable is basically calculating the probability for Y equals to one. So, this dependent variable we have to convert to a binary variable, which is going to take value as a zero or one, in order to convert the dependent variable or the response variable to a binary variable, which is taking zero or one we have to set a probability level this gravity level may differ from company to company. So, according to the company rules and policy, gravity level is set that is, if the created parameter is greater than the cutoff level, then the value of that then the value will be one that is it will be denoted as an event and if the greater probability will be less than the cutoff level, then the value will be zero that is that is it will be predicted it will be taken as a non event or this is denoted as a non event as you must is our dependent variable is calculating the probability for y was true even if my predictive probability is greater than Look at cutoff level that I will call it as an event that is the particular event will occur otherwise a non event for example, suppose in a company the carpark gravity level is set as point five.

So now preventive gravity is greater than point five then the value would be one that is it's an event that is the event has occurred. And if the probability is less than perfect that we do not even that is the value of busy. So logistic regression estimates the probability of an event, the dependent variable for logistic regression is converted to a response variable using the carpenter gravity level every company before building a model says this primary key level that is such that if the primary probability is greater than the carport level there is denoted as an event otherwise it is a non event, the response variable becomes the variable taking binary values that is the roba. Next, the most important thing is confusion matrix. This is a matrix which is used to measure the accuracy of your logistic regression model model. The confusion matrix is the most crucial metric commonly used to evaluate classification models, it is used to measure the accuracy of a model.

So, this is an example of a confusion matrix where these are observed values that is observed as zero and observed as one. And these are predicted values that is credited as zero predicted as one and then these are the total. So, there are two cases where my observed value is also zero rated Where is zero there are 30 cases were observed value zero and rated value is one. So, total is 80. There are 40 cases where observed value is validated when it is zero, and then 80 cases were also fair is one affected when this one total is over here at 120 and then 1900 This is my confusion matrix. Now, there are a number of measures that we can calculate using the confusion matrix.

Let's move on to that the measures of the confusion matrix first is known event a more known event is the opposite to an event. Correct the flexibility with what even a correctly classified even for a particular probability level when the production of an event observed outcome of an event is also seen when the predicted number of the number of cases were credited to you as an event and observed as an event is seen correctly clustered non event is for a particular probability level when the number of predicted known events is same as number of observed outcome as a non event next increment classified even for probability level prediction is an event but opposite outcome is a non event. So, in critical activity when this prediction is an event, but observed outcome is a non event, so predicted as an event, but the observed outcome is a non event so this is incorrectly classified event incorrectly classified known event for a particular probability level prediction is a non event, but observed outcome is an event so this set for this example incorrectly classified event is chain invariant plasmic non event is fine.

Percentage correct percentage correct is total number of correctly predict correct predictions that is total number of correctly classified events or non events by total number of observations. Hundred Years 50 plus hundred observations and 200 Next is what is sensitivity sensitivity is the measures the ability to predict or even correctly that is it is the ratio of correctly predicted correctly classified as events and total divided by total number of observed events in 200. And then specificity specificity, it measures the ability to predict a non event correctly. It is the ratio of correctly classified non events are correctly predicted as non events by total number of observed non events next is false positive false positive is equal to incorrectly credited as event by overproduction of event in 200. And false negative is incorrectly incorrectly credited as known event but overproduction of non event in 200. sensitivity is also known as true positive and specificity is also known as true negative.

The accuracy of the model from the confusion matrix can be checks to the percentage correct. So the more is the percentage correct That is better is the perfect last wait events and monuments in our models the better is the classification of our model. Now, this is a curve which is also known as our OC curve, which is also used to define how well how good is our model fit. So, our sigma was also used to measure the accuracy of the model area under RC always measures the accuracy of the model more is the area there is more closer it is to the value one that means it is closer to 100% better is the model fit and more is the accuracy of the model. The arasaka also known as receiver operator, operator characteristic curve is protect between specific specificity it is one minute specificity, and sensitivity sensitivity is taken in the y axis and specificity is taken the excess so Arosa determines the accuracy of a classification model.

At a user defined threshold when it determines the models accuracy using area under the curve. The area under the curve also referred to index of accuracy or concordant index represents the performance of highest area better is the model, or OCS plotted between sensitivity in the y axis and specificity in the x axis. So in this video we'll be doing here in a coming video, we'll be moving to the practical sessions of logistic regression of CDM in SAS phenomena p in this video over here, Thank you Goodbye. see you for the next video.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.