Welcome clinical data management program using SAS In this video I will be discussing about the concept of descriptive statistics and inferential statistics. So what is descriptive statistics descriptive statistics are brief descriptive coefficients that summarize a given data set which can be either a representation of the entire or a sample of a population. Descriptive Statistics are broken down into measures of central tendency and measures of variability that is spread. measures of central tendency include the mean, median and more by measures of variability include the standard deviation, variance and minimum and maximum variables that is magne with maximum values or variables and the kurtosis and skewness. So, basically, descriptive statistics in short, helps describe and understand the features of a specific data set by giving short summaries about the sample and measures of the data. The most recognized types of descriptive statistics are measures of center center, that is measures of central tendency, the mean, median and mode which are used at almost all levels of mathematics Institute.
Sticks the mean or the average is calculated by adding all the figures within the data set and then dividing by the number of figures within the set. For example, the sum of the following data set is 20 that is 23456 the sum of the following data set is 20. The mean is four that is 20 by five, the mode of a data set is the value appearing most often that is a value corresponding to the maximum frequency and the media is a figure situated in the middle of the data set median is the middle most observations that is for even number of observations it is in my direct observation plus, in beta plus one one observation divided by two and for our number of observations is in parallel to an observation So, there are certain concepts of descriptive statistics descriptive statistics summarizes or describes characteristics of a data set descriptive statistics consists of two basic categories of measures measures of central tendency and measures of variability or spread measures of central tendency describes the center of a data set measures the variability of spread describes the dispersion of data within the set measures central tendency describes the central position of a distribution for a data set.
A person analyzes the frequency of each data point in the distribution and describes it using the mean median or mode, which measures the most common patterns of the analyzed data set. Next is measures of variability or the measures of spread analyzes how spread out the distribution is for a set of data. For example, while the measures of central tendency may give a person the average of a data set, it does not describe how the data is distributed within the set. So while the average of the data may be 65, out of hundred, there can be still data points at both one and hundred. measures of variability help communicate this by describing the shape and spread of the data set range quadrants absolute deviation and variance are all examples of measures of variability Consider the following data sets c 519 2460 to 9100. The range of the data set is 95, because that is the difference between the highest and the lowest value.
So these are the concept of your descriptive statistics. When you have measures of central tendency mean median more measures of variability are all measures of dispersion a standard or standard deviation, variance range, absolute deviation, quartile deviation quartiles, coefficient of correlation coefficient of variation, minimum maximum values kurtosis, which tells about the peaks of the distribution How big is your distribution skewness is, which talks about the simplicity of the distribution Next, let's move to the concept of inferential statistics. inferential statistics allows you to make inferences about the population from the sample data, the topic that comes into inferential statistics, the sampling distributions and estimations hypothesis testing, correlation and regression significance of data sense. So first, let's discuss about what is sampling distributions and estimations. A sample is a representative subset of a population. Conducting a census and population is an idea but impractical approach.
In most of the cases, sampling is much more practical however, it is prone to sampling error, a sample non representative of populations called bias method of method chosen for sampling is called sampling bias convenience bias judgmental bias size bias response was the main types of sampling bias. The best technique for reducing bias in sampling is randomization simple random sampling is simplest of randomization techniques, cluster sampling, stratified sampling are other systematic sampling techniques sample means become more and more normally distributed around the true mean of the population parameter as we increase our sample size variability of the sample mean decreases as sample size increases. So, basically, we sample is a subset of a population we draw a sample from a population and we draw sample in such a way that the sample is true representative of the population. That is using a sample statistic value using a sample measured any measure with respect to sample is called a statistic.
Any measure with respect to population is called a parameter. So, using a sample statistic value we estimate the value of the population parameter more is the sample size. There is more probability that the sample statistic is close to the population parameter and that is the, the more accurate is your estimation of the population parameter. Next is the concept of hypothesis testing. hypothesis testing is a kind of statistical inference that involves asking a question collecting data and then examining what the data tells us about how to proceed. The hypothesis to be tested is called the null hypothesis and given the symbol H naught, we test the null hypothesis against an alternative hypothesis which is called a cheer or h1.
That is called relative hypothesis. So now hypothesis basically, you're making a proposition and alternative hypothesis is the complementary of your proposition. Say you want to test that whether the average weight of the calculate university students are 60 kilos or not. So my narrow box is the average of the average way to calculate UC students is 60 plus and the alternative hypothesis is not equal to 16. The alternative hypothesis always the complement of the null hypothesis. Next comes the concept of correlation and regression correlation refers to a mutual relationship or association between quantitative variables it gives the degree of association between the two quantitative variables that is, whether they are correlated to each other or not.
And what is the degree on how much of it how much of they are correlated, they can be positively correlated, then they will get an upward sloping graph they can be negatively correlated, then we'll get a downward sloping curve, a downward sloping graph and they can be uncorrelated then the data points will be clustered in the middle and in order to measure correlation, we basically get get the value of the correlation coefficient, which lies between zero and one. When the correlation coefficient is equal to point five we say the data is moderately correlated, and if the correlation is above point five then we say it is highly correlated if it is below conference it is it has minimum correlation that is there is minimum correlation between the quantitative variables Now we'll explain we'll understand that how the inferential statistics is used in data science. inferential statistics making inferences.
Inferential statistics can be used in data science in many ways. First, making inferences about the population from the sample. Next, concluding whether a sample is significantly different from the population then if adding or removing a feature from a model will help or will he really hope to improve the model 4.1 model significantly better than the other. And the last one is hypothesis testing in general. So this is the concept of descriptive statistics and inferential statistics. In this video we'll be doing till here in a coming video I'll be explaining you or I'll be discussing with you or the concept of interval estimations and confidence intervals.
Thank you. Goodbye. see you for the next video.