Sorting data and removing duplicates

Clinical Data Management Using SAS Exploring and Validating Data
8 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€59.88
List Price:  €85.55
You save:  €25.66
£51.91
List Price:  £74.16
You save:  £22.25
CA$96.55
List Price:  CA$137.94
You save:  CA$41.38
A$106.94
List Price:  A$152.78
You save:  A$45.84
S$90.06
List Price:  S$128.67
You save:  S$38.60
HK$546.11
List Price:  HK$780.19
You save:  HK$234.08
CHF 56.21
List Price:  CHF 80.30
You save:  CHF 24.09
NOK kr703
List Price:  NOK kr1,004.34
You save:  NOK kr301.33
DKK kr447.02
List Price:  DKK kr638.63
You save:  DKK kr191.60
NZ$119.09
List Price:  NZ$170.14
You save:  NZ$51.04
د.إ257.03
List Price:  د.إ367.21
You save:  د.إ110.17
৳8,520.17
List Price:  ৳12,172.20
You save:  ৳3,652.02
₹6,177.29
List Price:  ₹8,825.07
You save:  ₹2,647.78
RM295.62
List Price:  RM422.33
You save:  RM126.71
₦106,975.51
List Price:  ₦152,828.71
You save:  ₦45,853.20
₨19,868.89
List Price:  ₨28,385.35
You save:  ₨8,516.45
฿2,253.18
List Price:  ฿3,218.97
You save:  ฿965.79
₺2,887.40
List Price:  ₺4,125.04
You save:  ₺1,237.63
B$381.29
List Price:  B$544.72
You save:  B$163.43
R1,237.85
List Price:  R1,768.44
You save:  R530.58
Лв117.23
List Price:  Лв167.48
You save:  Лв50.25
₩97,299.65
List Price:  ₩139,005.46
You save:  ₩41,705.80
₪233.76
List Price:  ₪333.96
You save:  ₪100.19
₱3,982.74
List Price:  ₱5,689.88
You save:  ₱1,707.13
¥10,370.72
List Price:  ¥14,815.95
You save:  ¥4,445.23
MX$1,307.18
List Price:  MX$1,867.49
You save:  MX$560.30
QR254.81
List Price:  QR364.04
You save:  QR109.22
P1,007.37
List Price:  P1,439.16
You save:  P431.79
KSh9,042.70
List Price:  KSh12,918.70
You save:  KSh3,876
E£3,398.02
List Price:  E£4,854.53
You save:  E£1,456.50
ብር10,014.21
List Price:  ብር14,306.63
You save:  ብር4,292.42
Kz63,827.73
List Price:  Kz91,186.39
You save:  Kz27,358.65
CLP$68,047.07
List Price:  CLP$97,214.27
You save:  CLP$29,167.20
CN¥499.08
List Price:  CN¥713
You save:  CN¥213.92
RD$4,438.06
List Price:  RD$6,340.36
You save:  RD$1,902.30
DA9,089.14
List Price:  DA12,985.04
You save:  DA3,895.90
FJ$157.92
List Price:  FJ$225.61
You save:  FJ$67.69
Q537.12
List Price:  Q767.35
You save:  Q230.23
GY$14,646.12
List Price:  GY$20,923.93
You save:  GY$6,277.80
ISK kr8,575.17
List Price:  ISK kr12,250.77
You save:  ISK kr3,675.60
DH635.26
List Price:  DH907.55
You save:  DH272.29
L1,173.39
List Price:  L1,676.34
You save:  L502.95
ден3,696.82
List Price:  ден5,281.40
You save:  ден1,584.57
MOP$562.45
List Price:  MOP$803.54
You save:  MOP$241.08
N$1,241.62
List Price:  N$1,773.82
You save:  N$532.20
C$2,576.11
List Price:  C$3,680.31
You save:  C$1,104.20
रु9,866.78
List Price:  रु14,096.01
You save:  रु4,229.22
S/246.53
List Price:  S/352.21
You save:  S/105.67
K296.66
List Price:  K423.82
You save:  K127.15
SAR262.61
List Price:  SAR375.17
You save:  SAR112.56
ZK1,665.94
List Price:  ZK2,380.02
You save:  ZK714.07
L304.15
List Price:  L434.52
You save:  L130.37
Kč1,463.14
List Price:  Kč2,090.29
You save:  Kč627.14
Ft23,530.45
List Price:  Ft33,616.37
You save:  Ft10,085.92
SEK kr658.63
List Price:  SEK kr940.94
You save:  SEK kr282.31
ARS$95,361.78
List Price:  ARS$136,236.96
You save:  ARS$40,875.17
Bs483.93
List Price:  Bs691.36
You save:  Bs207.43
COP$279,578.30
List Price:  COP$399,414.69
You save:  COP$119,836.39
₡35,391.50
List Price:  ₡50,561.46
You save:  ₡15,169.95
L1,834.15
List Price:  L2,620.33
You save:  L786.18
₲504,626.01
List Price:  ₲720,925.20
You save:  ₲216,299.19
$U2,804.51
List Price:  $U4,006.62
You save:  $U1,202.10
zł254.73
List Price:  zł363.92
You save:  zł109.18
Already have an account? Log In

Transcript

Welcome to clinical data management program using SAS. In this video, we will be discussing about how to sort our data, how to group our data and how to remove duplicates. We'll be using the same library that he gave and we'll be using our disease data. So sorting means we will be sorting our data with respect to one or two variables, either ascending or descending order and then we'll be grouping our data with respect to one variable and then we are going to remove the duplicates from our data that is we will only display the unique values. So first, let's start of how to sort our data. So we are going to use the procedure called proc sort proc sort.

Data was CDM dot disease. Out equals disease to avoid using the keyword out because we do not want to modify our original disease data set that is disease that is EDM disease. That is our original data set. I will I'm creating a duplicate data set called disease to which is a copy of our original data set disease from CDM. And we are going to start our disease two data set with respect to a particular variable or variables in ascending or descending order being our original data set intact. So, to create a duplicate data set by marital status is our primary variable that is the first variable with respect to which sorting will be done and then the secondary variable is gender, we are going to sorted with respect to variables and that also in ascending order when we do not mention anything in our while statement, the default ordering is ascending order.

If you want to do descending order, then we have to use the keyword called D, C or descending we are going to do in our next code. So let's do run. proc start is a procedure which does not display any result viewer so you have to open the data set and then see the result of our approximate procedure here then we have sorted our duplicate data set that is called disease two with respect to the variable marital status and gender in ascending order. Since we did not specifying the library in our out statement. Therefore by default it is going to be created inside our work library there is a temporary library so let's open our work library This is our work leveling consisting of disease to CD sorted with respect to gender and marital status. First it will be sorted with respect to marital status and then then gender marital status is a primary variable gender is so angry now I show you all how to okay so we are going to sort the data now I'll show you how to sort the data in descending order.

We are going to use the same procedure called proc sort. It is proc sort data equals Let's close the view the CDM dot disease out was disease three semicolon by descending we're using the variable that is average commute we are going to solve the data set in descending order with respect to the variable average coming Then. So let's open the disease free data set to see another data set is sorted with respect to average commute in descending order, that is the highest values are on the top and the lowest values are at the bottom. This is how you sort your data set in descending order. Now the blue party asset with respect to marital status and for each group of marital status, we will be finding the sum of the rituals. So in order to group a data set with respect to a variable first we have to sort the data set with respect to that variable.

So we had already sorted our disease to data set with respect to marital status before so let's group it the necessary condition for grouping is that first observed data set with respect to the variable with respect to which you want to group so first sort the data set with respect to a variable then group the data set with respect to that variable. So we are able to use the same procedure that is PROC PRINT data was disease to bypass idle status This is I'm specifying the grouping statement it is it is going to the group with respect to marital status then some daily underscore internet use. That is for each group of marital status, the sum of daily internet use will be displayed semicolon and vendre. Let's run this court Let's close our reserve your seat is first denied status is single. The lady didn't use some was given for single marital status.

And this is for these are subtotal these are subtotals and grand totals. So there are two groups that are single and married. So we got it for single and we go to Format, the subtotals and grand totals. The next is that if you want to modify or recode it is suppose if you want to Displaying each group in a separate page and we want to modify the style of displaying the report then we can use the page violin ID statement, the same grouping will be done but in a different form. So, we are going to use the same procedure that is in procreate broken data equals disease by marital status, each byte marital status that is each group of writers teachers will be displayed in a separate page. I'll give a legend status some the use and then go first To be able to solve data is grouped with respect to marital status that is displayed in a buy statement.

Each group will be described in separately as a page by IDs to display our vital statement status in the leftmost column that is interpreted by default we get the observation column, we give ID statement. The variable that is specified by the statement our observation column gets replaced by the marital status or by the variable and this is when a district that is properly which we get the observation column by default over here, the observation column or the leftmost column, or the serial number column will be replaced by the marital status column next to some daily internet use that is for each group of marital status. We'll be getting the some of the internet use. So let's run this code. Similar to status, isn't it most color PHP displayed in a separate page this is a signal and the one of focus on marriage or marriage. Have the subtotal of daily energies as well as the revenue.

Now removing the duplicates or that again we only use a procedure called proc soft discuss our data was disease to disease to be able to use the keyword called nuru key to remove the duplicates key we are forming a temporary data set that is we will not modify our original data sets. We will not remove the duplicates from our original data set will remove it from our employee datasets. That's why we are creating a temporary data series involves taking example to disease underscore and then by gender on the surface respect to gender and the duplicate values agenda will be removed then raw. So let's run this code. Your disease and data sets it consists only two variables we'll remove the duplicate to remove the duplicate values with respect to gender, that is female and male. For now, let me end this video over here.

In our next video, we'll be doing the topic of computing data columns or C computing variables that is preparing our data. So thank you. Good. Well see what for the next week.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.