Welcome to clinical data management program using SAS. In this video, we will be discussing about how to interpret the different statistical measures using univariate procedure. So for this we first need to create our library using the lip name statement. So let's create the library. So listening then we'll give the library name that is TDM. When we have to give the path where our data is present, or the data sets are present, so this is the pop so let's run the lightning statement to see the data since we have got in a SAS environment.
Now we can access the data set we'll be using the procedure called PROC UNIVARIATE. Data equals CD Disease average underscore coming and then run. So, before I run this procedure, let me first show you the disease data set. This is my disease data set. We are working with this data set before also. So I hope you all remember this has got around 2000 observations.
And the variables are like ID gender, date of birth zip code, a province status education, where I just eat his children average commute, daily internet use available vehicles military service and the diseases that the patients are suffering from. So it's a data set based on patient's record, we are doing PROC UNIVARIATE data equals to c m dot disease EDM is my library named diseases may data set name VA, I've discovered that is we want to analyze our data with respect To this analytical variable that is average commute and then Dra. So, let's run this code to see the number of observations are 2000 sum of weights mean is given standard deviation skewness is given this is the coefficient of Skewness I hope you know that is the coefficient of Skewness is equals to zero then it is symmetric if it is greater than zero, it is positively skewed and less than zero is negatively skewed.
This is 0.02 is bit greater than zero. So, it is more positively skewed uncorrelated sum of squares coefficient of variation is 33% that is, as the coefficient of variation is basically SQL meaning 200 So, which is 33%. So, how much is our variation How much are the values varied from me, so, 33% is very much low which is good, that is the data is consistence. If it is close to 100% means the consistency is less now since the coefficient of variation is 30 3.00 that is it's quite less than hundred percent that means my data is more consistent the inconsistency is less similarly, you have a variance Next you have kurtosis kurtosis value, you know the coefficient of choruses if the coefficient of kurtosis is greater than three that means, the call is leptokurtic and if the coefficient of choruses is less than three, then the chorus trafficante then if it is equal to three, then it is symmetrical.
Next is your corrected sum of squares standard error mean then you have the basic statistical measures like mean median mode, standard deviation, variance range and interquartile range in dividends, you notice a difference between the third quarter and the fourth quarter that is q3 minus q1, when you have the tests for location that is t tests to lose t distribution and the p value then you have the quantiles what is the value half at hundred percent quantiles maximum value the 99% that is q 99. Or we can say or we can say if you talk it, talk in terms of percentage we can see it as percentiles. So, hundred percent is the max value this is p 99 p 95 p 90 p 75 and 50% means p 50 p 50 is equals to Q two is equal to define that is second quarter is equal to 50 decide is equal to 58% 25% p 25 10% 5% 1% zero percent is the minimum well, then you have the extreme observations, that is the lowest value the highest values are given these are the values and these are the corresponding observations.
So, this is the concept of mi univariate procedure. Now, let me explain to your how to check the normality using the univariate procedures we will be using the same procedure that is PROC UNIVARIATE PROC UNIVARIATE data equals CDM dot disease raw average commute. Here you need to write no normal plot. Check the normality Then let's run this code. So, this is my normality plot CV gathers saved statistical measures for the moments kurtosis variance coefficient of variation, the basic statistical measures that is mean median mode standard deviation variance range interquartile range, the test for location This is a test for normality see the P values are almost greater than 0.05. So, 590 5% confidence interval is by default that is 5% level of significance to distance greater than the level of significance that is that means, I will accept the null hypothesis that is mean data is normally distributed, these are my quantiles mean extreme observations and this is my normality plot normal probability plot So, it is almost normal district Did it is like a 45 degree line if you haven't perfect 45 degree line then the data is normally distributed.
So this is almost normally distributed. So this is your concept of your univariate procedure. This is how you interpret the statistical measures or interpret about the statistical measures using the univariate procedure. So, in this video we'll be learning to hear in May coming video I will be discussing about how to perform different tests, test of hypothesis or how to perform testing of hypotheses how to perform different tests. So let me end this video over here. Thank you.
Goodbye. See you all for the next video.