9 SASmarkdown
This section shows how to write SAS code inside an R Markdown workflow using the SASmarkdown package. The goal is to keep one reproducible document (HTML/PDF) while allowing SAS to do what SAS is good at: data steps, PROC routines, reporting, and macros.
Two ideas drive the design:
- knitr engines: a code chunk can be executed by an external program (SAS) instead of R.
- chunk isolation: in most SASmarkdown workflows, each SAS chunk runs as its own SAS “session script,” so objects created in one chunk may not automatically exist in another chunk (unless you explicitly write/read datasets).
This is why SASmarkdown feels different from pure SAS Studio and also different from pure R Markdown.
9.1 How to install SASmarkdown
The SASmarkdown package and documentation are here:
sasmarkdown
The typical setup steps are:
- Install and load
SASmarkdownin R. - Tell knitr where SAS is installed (the
sas.exepath). - Set SAS options (log, line size, batch mode).
- Set the chunk engine to a SAS engine such as
sas,saslog,sashtml, etc.
9.1.1 Use an engine
The first chunk loads the package. You can also set global knitr options here, but the key is that SASmarkdown must be available.
## Loading required package: SASmarkdown
## Warning: package 'SASmarkdown' was built under R version 4.4.3
## SAS found at C:/Program Files/SASHome/SASFoundation/9.4/sas.exe
## SAS engines are now ready to use.
Next, we define the SAS executable path and options, then tell knitr to use sashtml as the default engine.
What this means operationally:
- knitr writes your SAS code to a temporary
.sasfile, - calls
sas.exein batch mode, - captures the HTML output produced by SAS,
- embeds that output into your knitted document.
saspath <- "C:/Program Files/SASHome/SASFoundation/9.4/sas.exe"
sasopts <- "-nosplash -ls 75"
knitr::opts_chunk$set(engine='sashtml', engine.path=saspath,
engine.opts=sasopts, comment="")Common troubleshooting notes (Windows):
- If the path contains spaces, it must still be a valid Windows path string.
- If you have multiple SAS versions, confirm the correct sas.exe.
- Some corporate environments block batch execution; SASmarkdown may fail unless permissions allow it.
9.2 Common statements
Two practical rules matter most when writing SAS inside R Markdown:
Do not use R syntax inside SAS chunks
SAS chunks are sent to SAS verbatim. R objects likemydata,paste(), or%>%have no meaning there.Data typically cannot be used across chunks unless you persist it
Many SASmarkdown setups run each SAS chunk independently. If you createdata test; ... run;in one chunk, the next chunk may not “see”work.testunless the session is preserved or you write it out to a permanent library.
A safe habit is: - treat each SAS chunk as self-contained, or - write datasets to a permanent location/library and read them back later.
9.2.1 Read in data using informats (date data)
9.2.1.1 Example 1: a minimal DATA step with inline data
This example uses cards; ... ; to create a dataset directly.
It is the fastest way to create toy data for teaching, testing, or demonstrations.
/*import and export raw data using infile*/
data test;
/*infile "c:/document/data.text";*/
input name $ height weight;
length height 4;
cards
;
Daniel 173 150
;
run;
proc print;run;
Obs name height weight
1 Daniel 173 150
Key points:
- input name $ height weight; declares name as character ($) and the others numeric.
- length height 4; sets storage length (bytes) for numeric variables (mainly useful for character variables; numeric is typically 8 bytes by default in SAS).
- proc print; confirms what was read.
9.2.1.2 Example 2: reading mixed date strings using inputn()
Real-world data often stores dates as text in inconsistent formats.
Here we read two strings: one containing the date text and one containing the informat name. Then we convert dynamically using inputn().
data have;
input char_date : $20. fmt : $20.;
/* set a new variable format*/
num_char=inputn(char_date,fmt);
format num_char date9. ;
cards;
12-08-2015 DDMMYY12.
12/8/2016 MMDDYY12.
05/25/2015 MMDDYY12.
;
run;
proc print;run;
Obs char_date fmt num_char
1 12-08-2015 DDMMYY12. 12AUG2015
2 12/8/2016 MMDDYY12. 08DEC2016
3 05/25/2015 MMDDYY12. 25MAY2015
What’s happening:
- char_date is a character date string.
- fmt is a character variable holding an informat name (like DDMMYY12.).
- inputn(char_date, fmt) converts using the informat stored in fmt.
- format num_char date9.; controls display (e.g., 25MAY2015).
This pattern is extremely useful in messy ingestion pipelines.
9.2.1.3 Example 3: date informat directly in INPUT
If the date format is consistent, you can apply the informat in the input statement directly:
data have;
input char_date MMDDYY10. ;
/* output format */
format char_date date9. ;
cards;
12-08-2015
12/8/2016
05/25/2015
;
run;
proc print;run;
Obs char_date
1 08DEC2015
2 08DEC2016
3 25MAY2015
Here:
- SAS reads a date value using MMDDYY10. regardless of separator style.
- The stored value is numeric (SAS date = days since 01JAN1960), but displayed via date9..
9.2.1.4 Example 4: setting both input and output format consistently
Sometimes you want to keep the same “look” as the input:
data have;
/* set input and output format*/
input char_date MMDDYY10. ;
format char_date MMDDYY10. ;
cards;
12-08-2015
12/8/2016
05/25/2015
;
run;
proc print;run;
Obs char_date
1 12/08/2015
2 12/08/2016
3 05/25/2015
This helps when your deliverable expects a specific format (for example, when exporting to CSV to be read by another system).
9.2.2 Compute mean and frequency
A typical exploratory workflow in SAS uses proc means, proc freq, and proc contents.
proc means data=sashelp.class maxdec=5 ;
run;
proc freq data=sashelp.class ;
table sex;
run;
proc contents data=sashelp.class varnum ;
run;
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------
Age 19 13.31579 1.49267 11.00000 16.00000
Height 19 62.33684 5.12708 51.30000 72.00000
Weight 19 100.02632 22.77393 50.50000 150.00000
-------------------------------------------------------------------------
The FREQ Procedure
Cumulative Cumulative
Sex Frequency Percent Frequency Percent
--------------------------------------------------------
F 9 47.37 9 47.37
M 10 52.63 19 100.00
The CONTENTS Procedure
Data Set Name SASHELP.CLASS Observations 19
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 09/06/2017 21:55:32 Observation Length 40
Last Modified 09/06/2017 21:55:32 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label Student Data
Data Representation WINDOWS_64
Encoding us-ascii ASCII (ANSI)
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 1632
Obs in First Data Page 19
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Program
Files\SASHome\SASFoundation\9.
4\core\sashelp\class.sas7bdat
Release Created 9.0401M5
Host Created X64_SR12R2
Owner Name BUILTIN\Administrators
File Size 128KB
File Size (bytes) 131072
Variables in Creation Order
# Variable Type Len
1 Name Char 8
2 Sex Char 1
3 Age Num 8
4 Height Num 8
5 Weight Num 8
Interpretation:
- proc means summarizes numeric variables (mean, std, min, max, etc.).
- proc freq tabulates categorical variables.
- proc contents varnum shows dataset metadata ordered by variable position.
These are often the first three procs you run when you open a new dataset.
9.2.3 Sort a dataset
Sorting is required for many BY-group operations (including merges and BY processing).
proc sort data=sashelp.class Out= name ;
by name;
Run;
proc print data= name (obs=10) ;
run;
Obs Name Sex Age Height Weight
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
3 Barbara F 13 65.3 98.0
4 Carol F 14 62.8 102.5
5 Henry M 14 63.5 102.5
6 James M 12 57.3 83.0
7 Jane F 12 59.8 84.5
8 Janet F 15 62.5 112.5
9 Jeffrey M 13 62.5 84.0
10 John M 12 59.0 99.5
Note:
- The output dataset name name is legal but can be confusing because name is also a variable in sashelp.class.
- A safer convention is out=class_sorted, but we keep your code unchanged.
9.2.4 Transpose or reshape
Reshaping appears frequently in reporting and longitudinal data workflows.
9.2.4.1 Wide to long
proc transpose converts columns into rows.
proc transpose data= sashelp.class out= Field;
by name;
Run;
proc print data= field (obs=10) ;
run;
Obs Name _NAME_ COL1
1 Alfred Age 14.0
2 Alfred Height 69.0
3 Alfred Weight 112.5
4 Alice Age 13.0
5 Alice Height 56.5
6 Alice Weight 84.0
7 Barbara Age 13.0
8 Barbara Height 65.3
9 Barbara Weight 98.0
10 Carol Age 14.0
A critical detail:
- BY processing requires sorted data by name (or SAS will warn/error depending on settings).
9.2.4.2 Long to wide
A second transpose can rebuild a wide layout, typically using:
- id to name new columns,
- var to specify the values to spread.
proc transpose data= sashelp.class out= Field;
by name;
Run;
proc transpose data=Field out=Field_wide ;
by name ;
id _name_;
var col1;
run;
proc print data= field_wide (obs=10) ;
run;
Obs Name _NAME_ Age Height Weight
1 Alfred COL1 14 69.0 112.5
2 Alice COL1 13 56.5 84.0
3 Barbara COL1 13 65.3 98.0
4 Carol COL1 14 62.8 102.5
5 Henry COL1 14 63.5 102.5
6 James COL1 12 57.3 83.0
7 Jane COL1 12 59.8 84.5
8 Janet COL1 15 62.5 112.5
9 Jeffrey COL1 13 62.5 84.0
10 John COL1 12 59.0 99.5
Conceptually:
- The first transpose creates a “longer” dataset with _name_ and col1.
- The second transpose pivots _name_ levels into columns.
9.2.5 Conditional statements
Conditional logic is fundamental in data steps: creating flags, categories, derived variables, and messages.
9.2.5.1 Simple IF/ELSE
DATA ab;
set sashelp.class;
IF sex="F" then message='A is greater';
Else message='B is greater';
Run;
proc print data=ab (obs=10);
run;
Obs Name Sex Age Height Weight message
1 Alfred M 14 69.0 112.5 B is greater
2 Alice F 13 56.5 84.0 A is greater
3 Barbara F 13 65.3 98.0 A is greater
4 Carol F 14 62.8 102.5 A is greater
5 Henry M 14 63.5 102.5 B is greater
6 James M 12 57.3 83.0 B is greater
7 Jane F 12 59.8 84.5 A is greater
8 Janet F 15 62.5 112.5 A is greater
9 Jeffrey M 13 62.5 84.0 B is greater
10 John M 12 59.0 99.5 B is greater
Even though the message text is arbitrary, the example demonstrates: - IF/ELSE structure, - character literals in quotes, - adding derived variables in a data step.
9.2.5.2 Multiple conditions (ELSE IF)
Data class;
set sashelp.class;
if weight >=100 then weight_cat =3;
else if weight >=90 and weight <100 then weight_cat =2 ;
else weight_cat =1;
attrib weight_cat label = "weight category, 1=Inad, 2=Adeq, 3=Exces";
run;
proc print data=class (obs=3);
run;
weight_
Obs Name Sex Age Height Weight cat
1 Alfred M 14 69.0 112.5 3
2 Alice F 13 56.5 84.0 1
3 Barbara F 13 65.3 98.0 2
This is a standard pattern for category creation: - ordered thresholds, - mutually exclusive categories, - labeling for readable tables.
9.2.5.3 index() for pattern detection
data test;
set sashelp.class;
length gender $10 ;
if index(sex,"M")>0 THEN gender="male";
if index(sex,"F")>0 THEN gender="female";
RUN;
proc print data=test (obs=5);
run;
Obs Name Sex Age Height Weight gender
1 Alfred M 14 69.0 112.5 male
2 Alice F 13 56.5 84.0 female
3 Barbara F 13 65.3 98.0 female
4 Carol F 14 62.8 102.5 female
5 Henry M 14 63.5 102.5 male
Note:
- index() returns the position of a substring (0 if not found).
- Because you used two separate IF statements (not IF/ELSE), the logic is still fine here because sex can’t be both “M” and “F”.
9.2.6 LIKE operation to select rows containing a pattern
SAS where supports SQL-like pattern matching using like with % wildcards.
proc print data =sashelp.class;
where upcase(name) like '%A%';
run;
proc print data =sashelp.class;
where (name) like '%A%';
run;
Obs Name Sex Age Height Weight
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
3 Barbara F 13 65.3 98.0
4 Carol F 14 62.8 102.5
6 James M 12 57.3 83.0
7 Jane F 12 59.8 84.5
8 Janet F 15 62.5 112.5
14 Mary F 15 66.5 112.0
17 Ronald M 15 67.0 133.0
18 Thomas M 11 57.5 85.0
19 William M 15 66.5 112.0
Obs Name Sex Age Height Weight
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
The first query is case-insensitive by forcing uppercase.
The second is case-sensitive, depending on your SAS session settings.
9.2.7 Change format of a variable
User-defined formats map raw values into labeled categories (often for tables).
proc format;
value AGEnew
11 = '1: NEW'
12 = '2: NEW'
13 = '3: NEW'
14 = '4: NEW'
15 = '5: NEW'
16 = '6: NEW'
;
run;
DATA ab;
set sashelp.class;
Format AGE AGEnew.;
Run;
proc freq data=ab;
table AGE;
run;
The FREQ Procedure
Cumulative Cumulative
Age Frequency Percent Frequency Percent
-----------------------------------------------------------
1: NEW 2 10.53 2 10.53
2: NEW 5 26.32 7 36.84
3: NEW 3 15.79 10 52.63
4: NEW 4 21.05 14 73.68
5: NEW 4 21.05 18 94.74
6: NEW 1 5.26 19 100.00
This workflow is typical:
1) define format in proc format,
2) apply format in a data step,
3) display with proc freq.
9.2.8 Basic operations
SAS provides many numeric functions directly in the data step.
Data Mathdata;
A= 10.12345;
B=20;
C= mean (a,b);
D= Min(a,b);
E= Max(a,b);
F = log(a);
G= round(a,0.02);
H= floor(a );
Run;
proc print data=Mathdata;
run;
Data mathdata;
Set sashelp.Iris;
Sum = sum (of SepalLength, SepalWIDTH);
Diff = SepalLength- SepalWIDTH;
Mult = SepalLength* SepalWIDTH;
Div= SepalLength/ SepalWIDTH;
Run;
proc print data=mathdata (obs=10);
run;
Obs A B C D E F G H
1 10.1235 20 15.0617 10.1235 20 2.31485 10.12 10
Sepal Sepal Petal Petal
Obs Species Length Width Length Width Sum Diff Mult Div
1 Setosa 50 33 14 2 83 17 1650 1.51515
2 Setosa 46 34 14 3 80 12 1564 1.35294
3 Setosa 46 36 10 2 82 10 1656 1.27778
4 Setosa 51 33 17 5 84 18 1683 1.54545
5 Setosa 55 35 13 2 90 20 1925 1.57143
6 Setosa 48 31 16 2 79 17 1488 1.54839
7 Setosa 52 34 14 2 86 18 1768 1.52941
8 Setosa 49 36 14 1 85 13 1764 1.36111
9 Setosa 44 32 13 2 76 12 1408 1.37500
10 Setosa 50 35 16 6 85 15 1750 1.42857
Two common teaching points:
- SAS functions like mean() handle missing values differently than plain arithmetic.
- sum(of ...) is a very SAS-specific idiom and scales well when many variables are involved.
9.2.9 Rename variables
9.2.9.1 Rename one variable
Data AB;
set sashelp.class; Rename AGE= AGENEW;
Run;
proc print data=AB;
run;
Obs Name Sex AGENEW Height Weight
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
3 Barbara F 13 65.3 98.0
4 Carol F 14 62.8 102.5
5 Henry M 14 63.5 102.5
6 James M 12 57.3 83.0
7 Jane F 12 59.8 84.5
8 Janet F 15 62.5 112.5
9 Jeffrey M 13 62.5 84.0
10 John M 12 59.0 99.5
11 Joyce F 11 51.3 50.5
12 Judy F 14 64.3 90.0
13 Louise F 12 56.3 77.0
14 Mary F 15 66.5 112.0
15 Philip M 16 72.0 150.0
16 Robert M 12 64.8 128.0
17 Ronald M 15 67.0 133.0
18 Thomas M 11 57.5 85.0
19 William M 15 66.5 112.0
9.2.9.2 Rename multiple variables programmatically
This advanced example shows how to build a rename list from dictionary.columns and apply it.
2 /*rename*/
3 data AAA;
4 set sashelp.class;
5 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.AAA has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
6
7 /*create a list*/
8 proc sql noprint;
9 select cats(name,'=',name,'_new')
10 into : lis
11 separated by ' '
12 from dictionary.columns
13 where libname = 'WORK' and memname = 'AAA';
14 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
15 %put &lis;
Name=Name_new Sex=Sex_new Age=Age_new Height=Height_new Weight=Weight_new
16
17 proc datasets library = work nolist;
18 modify AAA;
19 rename &lis;
NOTE: Renaming variable Name to Name_new.
NOTE: Renaming variable Sex to Sex_new.
NOTE: Renaming variable Age to Age_new.
NOTE: Renaming variable Height to Height_new.
NOTE: Renaming variable Weight to Weight_new.
20 quit;
NOTE: MODIFY was successful for WORK.AAA.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
21
22 proc print data=aaa (obs=2);
23 run;
NOTE: There were 2 observations read from the data set WORK.AAA.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
24
25 /*function to deal with text*/
26 %scan(&lis,2);
NOTE: Line generated by the macro function "SCAN".
26 Sex=Sex_new
___
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
27 /*#scan variable*/
28 %substr(NAME, %length(NAME),1);
NOTE: Line generated by the macro function "SUBSTR".
28 E
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
29 /*#extract character*/
30 %substr(NAME, 2,1);
NOTE: Line generated by the macro function "SUBSTR".
30 A
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
31 %index(&lis,"Age=Age_new") ;
NOTE: Line generated by the macro function "INDEX".
31 0
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
32 /*#identify this exists */
ERROR: Errors printed on page 1.
Height_ Weight_
Obs Name_new Sex_new Age_new new new
1 Alfred M 14 69.0 112.5
2 Alice F 13 56.5 84.0
Why this matters: - In real clinical trial pipelines, you often receive datasets with inconsistent naming conventions. - Generating rename rules programmatically reduces manual error and improves reproducibility.
9.2.10 Text manipulation
9.2.10.1 Extract text from a character value
Data Mathdata;
Text = "Hello World";
Text1= substr(Text, 6, 2);
Run;
proc print data=mathdata ;
run;
Obs Text Text1
1 Hello World W
substr() is one of the most-used string functions in clinical reporting.
9.2.10.2 Convert character to numeric and reverse
Character → numeric:
Data ABC;
set sashelp.class;
agenew= input (age, best.);
Run;
proc print data=abc ;
run;
Obs Name Sex Age Height Weight agenew
1 Alfred M 14 69.0 112.5 14
2 Alice F 13 56.5 84.0 13
3 Barbara F 13 65.3 98.0 13
4 Carol F 14 62.8 102.5 14
5 Henry M 14 63.5 102.5 14
6 James M 12 57.3 83.0 12
7 Jane F 12 59.8 84.5 12
8 Janet F 15 62.5 112.5 15
9 Jeffrey M 13 62.5 84.0 13
10 John M 12 59.0 99.5 12
11 Joyce F 11 51.3 50.5 11
12 Judy F 14 64.3 90.0 14
13 Louise F 12 56.3 77.0 12
14 Mary F 15 66.5 112.0 15
15 Philip M 16 72.0 150.0 16
16 Robert M 12 64.8 128.0 12
17 Ronald M 15 67.0 133.0 15
18 Thomas M 11 57.5 85.0 11
19 William M 15 66.5 112.0 15
Numeric → character:
Data ABC;
set sashelp.class;
agenew= put (age, best.);
Run;
proc contents data=abc ;
run;
The CONTENTS Procedure
Data Set Name WORK.ABC Observations 19
Member Type DATA Variables 6
Engine V9 Indexes 0
Created 02/20/2026 16:44:41 Observation Length 48
Last Modified 02/20/2026 16:44:41 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation WINDOWS_64
Encoding wlatin1 Western (Windows)
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 1361
Obs in First Data Page 19
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\hed2\AppData\Local\Temp\SAS Temporary
Files\_TD19308_HDW02234493_\abc.sas7bdat
Release Created 9.0401M5
Host Created X64_10PRO
Owner Name NIH\hed2
File Size 128KB
File Size (bytes) 131072
Alphabetic List of Variables and Attributes
# Variable Type Len
3 Age Num 8
4 Height Num 8
1 Name Char 8
2 Sex Char 1
5 Weight Num 8
6 agenew Char 12
In practice:
- input() reads a character string into numeric.
- put() formats numeric into character.
9.2.10.3 Change the length of a variable
Data ABC;
set sashelp.class;
Length agenew $10.;
Label agenew=“New age”;
Run;
proc contents data=abc ;
run;
The CONTENTS Procedure
Data Set Name WORK.ABC Observations 19
Member Type DATA Variables 6
Engine V9 Indexes 0
Created 02/20/2026 16:44:42 Observation Length 48
Last Modified 02/20/2026 16:44:42 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation WINDOWS_64
Encoding wlatin1 Western (Windows)
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 1361
Obs in First Data Page 19
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\hed2\AppData\Local\Temp\SAS Temporary
Files\_TD24572_HDW02234493_\abc.sas7bdat
Release Created 9.0401M5
Host Created X64_10PRO
Owner Name NIH\hed2
File Size 128KB
File Size (bytes) 131072
Alphabetic List of Variables and Attributes
# Variable Type Len Label
3 Age Num 8
4 Height Num 8
1 Name Char 8
2 Sex Char 1
5 Weight Num 8
6 agenew Char 10 âNew ageâ
A key habit: define length before assignment when creating new character variables to avoid truncation.
9.2.11 Create a report
proc report is a flexible reporting tool and can replace many proc print / proc tabulate workflows.
proc report data=sashelp.class;
Column age;
Define age / display;
Run;
Age
14
13
13
14
14
12
12
15
13
12
11
14
12
15
16
12
15
11
15
Even this minimal example shows the structure:
- column defines the layout,
- define controls how variables appear.
9.2.12 Random variables
SAS provides pseudo-random generators via rand().
data ab;
set sashelp.class;
num=rand("normal");
run;
proc print data=ab (obs=10);
run;
Obs Name Sex Age Height Weight num
1 Alfred M 14 69.0 112.5 -0.44832
2 Alice F 13 56.5 84.0 1.55149
3 Barbara F 13 65.3 98.0 -0.95175
4 Carol F 14 62.8 102.5 -0.66781
5 Henry M 14 63.5 102.5 0.25197
6 James M 12 57.3 83.0 -1.64294
7 Jane F 12 59.8 84.5 0.96083
8 Janet F 15 62.5 112.5 -0.45714
9 Jeffrey M 13 62.5 84.0 -0.63351
10 John M 12 59.0 99.5 -1.00366
This is useful for simulation, multiple imputation diagnostics, and resampling.
9.2.13 Combine two texts, compress spaces, locate a substring, change case
These are common in cleaning messy character fields.
Combine:
Data Mathdata;
Text = "Hello";
Text1= "World";
Text2= text || " " ||text1;
Run;
proc print data=Mathdata ;
run;
Obs Text Text1 Text2
1 Hello World Hello World
Compress:
Data Mathdata;
Text = "Hello World ";
Text1= trim(text);
Text2= compress(text);
Run;
proc print data=Mathdata ;
run;
Obs Text Text1 Text2
1 Hello World Hello World HelloWorld
Index:
Data Mathdata;
Text = "Hello World";
indextext= index(text, "or");
Run;
proc print data=Mathdata ;
run;
Obs Text indextext
1 Hello World 8
Case functions:
Data Mathdata;
Text = "Hello World";
upcase= upcase(text );
lowcase= lowcase(text );
propcase= propcase(text );
Run;
proc print data=Mathdata ;
run;
Obs Text upcase lowcase propcase
1 Hello World HELLO WORLD hello world Hello World
9.2.14 Deduplication
Deduplication is a frequent preprocessing step, especially when identifying unique subjects, visits, or records.
9.2.14.1 Method 1: nodupkeys
proc sort data = sashelp.class out = dedup
nodupkeys;
by height;
run;
proc print data= dedup;
run;
Obs Name Sex Age Height Weight
1 Joyce F 11 51.3 50.5
2 Louise F 12 56.3 77.0
3 Alice F 13 56.5 84.0
4 James M 12 57.3 83.0
5 Thomas M 11 57.5 85.0
6 John M 12 59.0 99.5
7 Jane F 12 59.8 84.5
8 Janet F 15 62.5 112.5
9 Carol F 14 62.8 102.5
10 Henry M 14 63.5 102.5
11 Judy F 14 64.3 90.0
12 Robert M 12 64.8 128.0
13 Barbara F 13 65.3 98.0
14 Mary F 15 66.5 112.0
15 Ronald M 15 67.0 133.0
16 Alfred M 14 69.0 112.5
17 Philip M 16 72.0 150.0
This keeps the first record for each unique height.
9.2.14.2 Method 2: BY-group with first. / last.
proc sort data=sashelp.class out=data_rank; by height ; run;
data outdata dropdata;
set data_rank;
by height ;
if last.height then output outdata;
else output dropdata;
run;
proc print data=dropdata;run;
Obs Name Sex Age Height Weight
1 Janet F 15 62.5 112.5
2 Mary F 15 66.5 112.0
This version keeps the last record per height.
Notice that the output depends on the current sort order.
9.2.15 Select a subset of rows
data where;
set sashelp.class;
where sex ne "F";
/*if */
run;
proc print data= where;
run;
Obs Name Sex Age Height Weight
1 Alfred M 14 69.0 112.5
2 Henry M 14 63.5 102.5
3 James M 12 57.3 83.0
4 Jeffrey M 13 62.5 84.0
5 John M 12 59.0 99.5
6 Philip M 16 72.0 150.0
7 Robert M 12 64.8 128.0
8 Ronald M 15 67.0 133.0
9 Thomas M 11 57.5 85.0
10 William M 15 66.5 112.0
The where statement filters rows during reading, which is typically more efficient than filtering after reading.
9.2.16 Create macros with DO loops
Macros provide automation, especially when the same PROC is repeated across multiple variables.
9.2.16.1 A macro to calculate descriptive stats across multiple variables
This example uses:
- proc means to compute summaries by sex,
- ODS output to capture results,
- a macro loop to run across variables.
2 %macro means(var_avg) ;
3
4 /*calculate means*/
5 proc means data=sashelp.class StackODSOutput n mean std min p5
5 ! p95 max nmiss;
6 var &var_avg;
7 class sex;
8 ods output summary=result2;
9 run;
10
11 /*append then output*/
12 data masterresult2; * combine results;
13 set masterresult2 result2;
14 run;
15
16 %mend means;
17
18 /*use macro to merge all descriptive stats */
19 data masterresult2 ;
20 set _null_;
21 run;
NOTE: The data set WORK.MASTERRESULT2 has 0 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
22
23 %let vars=
24 age
25 height
26 weight
27 ;
28
29 %macro model ;
30 %do i=1 %to %sysfunc(countw(&vars));
31
32 %let x=%scan(&vars,&i);
33 %means( &x )
34
35 %end;
36 %mend model;
37
38 %model;
NOTE: The data set WORK.RESULT2 has 2 observations and 12 variables.
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The PROCEDURE MEANS printed page 1.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.07 seconds
cpu time 0.00 seconds
NOTE: There were 0 observations read from the data set WORK.MASTERRESULT2.
NOTE: There were 2 observations read from the data set WORK.RESULT2.
NOTE: The data set WORK.MASTERRESULT2 has 2 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: The data set WORK.RESULT2 has 2 observations and 12 variables.
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The PROCEDURE MEANS printed page 2.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
WARNING: Multiple lengths were specified for the variable Variable by
input data set(s). This can cause truncation of data.
NOTE: There were 2 observations read from the data set WORK.MASTERRESULT2.
NOTE: There were 2 observations read from the data set WORK.RESULT2.
NOTE: The data set WORK.MASTERRESULT2 has 4 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
NOTE: The data set WORK.RESULT2 has 2 observations and 12 variables.
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The PROCEDURE MEANS printed page 3.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
WARNING: Multiple lengths were specified for the variable Variable by
input data set(s). This can cause truncation of data.
NOTE: There were 4 observations read from the data set WORK.MASTERRESULT2.
NOTE: There were 2 observations read from the data set WORK.RESULT2.
NOTE: The data set WORK.MASTERRESULT2 has 6 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
39
40 proc print data= masterresult2;
41 run;
NOTE: There were 6 observations read from the data set WORK.MASTERRESULT2.
NOTE: The PROCEDURE PRINT printed page 4.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
42
The MEANS Procedure
Analysis Variable : Age
N
Sex Obs N Mean Std Dev Minimum 5th Pctl
--------------------------------------------------------------------------
F 9 9 13.222222 1.394433 11.000000 11.000000
M 10 10 13.400000 1.646545 11.000000 11.000000
--------------------------------------------------------------------------
Analysis Variable : Age
N N
Sex Obs 95th Pctl Maximum Miss
------------------------------------------------
F 9 15.000000 15.000000 0
M 10 16.000000 16.000000 0
------------------------------------------------
The MEANS Procedure
Analysis Variable : Height
N
Sex Obs N Mean Std Dev Minimum 5th Pctl
--------------------------------------------------------------------------
F 9 9 60.588889 5.018328 51.300000 51.300000
M 10 10 63.910000 4.937937 57.300000 57.300000
--------------------------------------------------------------------------
Analysis Variable : Height
N N
Sex Obs 95th Pctl Maximum Miss
------------------------------------------------
F 9 66.500000 66.500000 0
M 10 72.000000 72.000000 0
------------------------------------------------
The MEANS Procedure
Analysis Variable : Weight
N
Sex Obs N Mean Std Dev Minimum 5th Pctl
--------------------------------------------------------------------------
F 9 9 90.111111 19.383914 50.500000 50.500000
M 10 10 108.950000 22.727186 83.000000 83.000000
--------------------------------------------------------------------------
Analysis Variable : Weight
N N
Sex Obs 95th Pctl Maximum Miss
------------------------------------------------
F 9 112.500000 112.500000 0
M 10 150.000000 150.000000 0
------------------------------------------------
Obs Sex NObs _control_ Variable N Mean StdDev
1 F 9 Age 9 13.222222 1.394433
2 M 10 1 Age 10 13.400000 1.646545
3 F 9 Hei 9 60.588889 5.018328
4 M 10 1 Hei 10 63.910000 4.937937
5 F 9 Wei 9 90.111111 19.383914
6 M 10 1 Wei 10 108.950000 22.727186
Obs Min P5 P95 Max NMiss
1 11.000000 11.000000 15.000000 15.000000 0
2 11.000000 11.000000 16.000000 16.000000 0
3 51.300000 51.300000 66.500000 66.500000 0
4 57.300000 57.300000 72.000000 72.000000 0
5 50.500000 50.500000 112.500000 112.500000 0
6 83.000000 83.000000 150.000000 150.000000 0
Practical interpretation:
- &vars defines the batch of variables.
- %do iterates through them.
- Each iteration runs proc means and appends results into a single dataset.
This is exactly the type of pattern used in clinical reporting pipelines when building shells.
9.2.17 Output intermediate tables with ODS
ODS is the bridge from SAS procedures to reusable datasets.
9.2.17.1 Discover table names: ods trace
ods trace on;
proc freq data=sashelp.class ;
table sex;
run;
ods trace off;
The FREQ Procedure
Cumulative Cumulative
Sex Frequency Percent Frequency Percent
--------------------------------------------------------
F 9 47.37 9 47.37
M 10 52.63 19 100.00
ods trace tells you which internal table objects SAS produces.
9.2.17.2 Capture a specific output table
ods listing close;
ods trace on;
ods output ParameterEstimates= ParameterEstimates ;
proc glm data=sashelp.class;
model height=age;
run;
ods trace off;
ods listing;
proc print data=ParameterEstimates;
run;
Obs Dependent Parameter Estimate StdErr tValue Probt
1 Height Intercept 25.22388451 6.52168912 3.87 0.0012
2 Height Age 2.78713911 0.48688163 5.72 <.0001
This pattern is essential when you want to: - compute results in SAS, - feed them into later steps, - export them, or - compare them against R outputs.
9.2.18 Create sequence numbers
Sequence variables are common in SDTM/ADaM creation and tracking.
9.2.18.1 Grouped sequence numbers
/* grouped*/
proc sort data=sashelp.class out=class ; by name; run;
data temp; set class;
by name;
count + 1;
if first.name then count = 1;
run;
proc print data=temp (obs=3);run;
proc sort data=class out=class2 ; by height; run;
data class3;
set class2;
by height;
retain count;
if first.height then count=1;
else count=count+1;
run;
proc print data=class3 (obs=3);run;
/*nogrouped*/
data new;
set class;
seqno = _n_;
run;
proc print data=new (obs=3);run;
Obs Name Sex Age Height Weight count
1 Alfred M 14 69.0 112.5 1
2 Alice F 13 56.5 84.0 1
3 Barbara F 13 65.3 98.0 1
Obs Name Sex Age Height Weight count
1 Joyce F 11 51.3 50.5 1
2 Louise F 12 56.3 77.0 1
3 Alice F 13 56.5 84.0 1
Obs Name Sex Age Height Weight seqno
1 Alfred M 14 69.0 112.5 1
2 Alice F 13 56.5 84.0 2
3 Barbara F 13 65.3 98.0 3
Key idea:
- _n_ is the observation index within the data step.
- first. and last. require sorted data and allow per-group logic.
9.2.19 Merge datasets
SAS has two main approaches:
- SQL joins (proc sql) — flexible and readable,
- DATA step merge — fast and common in production, but requires sorting and careful logic.
9.2.19.1 Left and right join
2 data class2;
3 input name $ score;
4 datalines;
NOTE: The data set WORK.CLASS2 has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
8 ;
9 run;
10
11 proc sql;
12 create table class_c as
13 select *
14 from sashelp.class as a
15 left join class2 as b
16 on a.name = b.name;
WARNING: Variable name already exists on file WORK.CLASS_C.
NOTE: Table WORK.CLASS_C created, with 19 rows and 6 columns.
17 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
18
19 proc print data=class_c;
20 run;
NOTE: There were 19 observations read from the data set WORK.CLASS_C.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
21
22 proc sql;
23 create table class_d as
24 select *
25 from class2 as a
26 right join sashelp.class as b
27 on a.name = b.name;
WARNING: Variable Name already exists on file WORK.CLASS_D.
NOTE: Table WORK.CLASS_D created, with 19 rows and 6 columns.
28 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
29
30 proc print data=class_d;
31 run;
NOTE: There were 19 observations read from the data set WORK.CLASS_D.
NOTE: The PROCEDURE PRINT printed page 2.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
32
Obs Name Sex Age Height Weight score
1 Alfred M 14 69.0 112.5 85
2 Alice F 13 56.5 84.0 89
3 Barbara F 13 65.3 98.0 .
4 Carol F 14 62.8 102.5 .
5 Henry M 14 63.5 102.5 .
6 James M 12 57.3 83.0 .
7 Jane F 12 59.8 84.5 .
8 Janet F 15 62.5 112.5 .
9 Jeffrey M 13 62.5 84.0 .
10 John M 12 59.0 99.5 .
11 Joyce F 11 51.3 50.5 .
12 Judy F 14 64.3 90.0 .
13 Louise F 12 56.3 77.0 .
14 Mary F 15 66.5 112.0 .
15 Philip M 16 72.0 150.0 .
16 Robert M 12 64.8 128.0 .
17 Ronald M 15 67.0 133.0 .
18 Thomas M 11 57.5 85.0 .
19 William M 15 66.5 112.0 .
Obs name score Sex Age Height Weight
1 Alfred 85 M 14 69.0 112.5
2 Alice 89 F 13 56.5 84.0
3 . F 13 65.3 98.0
4 . F 14 62.8 102.5
5 . M 14 63.5 102.5
6 . M 12 57.3 83.0
7 . F 12 59.8 84.5
8 . F 15 62.5 112.5
9 . M 13 62.5 84.0
10 . M 12 59.0 99.5
11 . F 11 51.3 50.5
12 . F 14 64.3 90.0
13 . F 12 56.3 77.0
14 . F 15 66.5 112.0
15 . M 16 72.0 150.0
16 . M 12 64.8 128.0
17 . M 15 67.0 133.0
18 . M 11 57.5 85.0
19 . M 15 66.5 112.0
9.2.19.2 Full join
2 data class2;
3 input name $ score;
4 datalines;
NOTE: The data set WORK.CLASS2 has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
8 ;
9 run;
10
11 proc sql;
12 create table class_e as
13 select *
14 from sashelp.class as a
15 full join class2 as b
16 on a.name = b.name;
WARNING: Variable name already exists on file WORK.CLASS_E.
NOTE: Table WORK.CLASS_E created, with 20 rows and 6 columns.
17 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
18
19 proc print data=class_e;
20 run;
NOTE: There were 20 observations read from the data set WORK.CLASS_E.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds
21
Obs Name Sex Age Height Weight score
1 Alfred M 14 69.0 112.5 85
2 Alice F 13 56.5 84.0 89
3 Barbara F 13 65.3 98.0 .
4 Carol F 14 62.8 102.5 .
5 . . . 99
6 Henry M 14 63.5 102.5 .
7 James M 12 57.3 83.0 .
8 Jane F 12 59.8 84.5 .
9 Janet F 15 62.5 112.5 .
10 Jeffrey M 13 62.5 84.0 .
11 John M 12 59.0 99.5 .
12 Joyce F 11 51.3 50.5 .
13 Judy F 14 64.3 90.0 .
14 Louise F 12 56.3 77.0 .
15 Mary F 15 66.5 112.0 .
16 Philip M 16 72.0 150.0 .
17 Robert M 12 64.8 128.0 .
18 Ronald M 15 67.0 133.0 .
19 Thomas M 11 57.5 85.0 .
20 William M 15 66.5 112.0 .
9.2.19.3 DATA step merge (append-style merge by key)
data class2;
input name $ score;
datalines;
Alfred 85
Alice 89
Daniel 99
;
run;
data class_f;
merge sashelp.class class2;
by name;
/*if A and B; */
run;
proc print data=class_f;
run;
Obs Name Sex Age Height Weight score
1 Alfred M 14 69.0 112.5 85
2 Alice F 13 56.5 84.0 89
3 Barbara F 13 65.3 98.0 .
4 Carol F 14 62.8 102.5 .
5 Daniel . . . 99
6 Henry M 14 63.5 102.5 .
7 James M 12 57.3 83.0 .
8 Jane F 12 59.8 84.5 .
9 Janet F 15 62.5 112.5 .
10 Jeffrey M 13 62.5 84.0 .
11 John M 12 59.0 99.5 .
12 Joyce F 11 51.3 50.5 .
13 Judy F 14 64.3 90.0 .
14 Louise F 12 56.3 77.0 .
15 Mary F 15 66.5 112.0 .
16 Philip M 16 72.0 150.0 .
17 Robert M 12 64.8 128.0 .
18 Ronald M 15 67.0 133.0 .
19 Thomas M 11 57.5 85.0 .
20 William M 15 66.5 112.0 .
9.2.19.4 Inner join
2 data class2;
3 input name $ score;
4 datalines;
NOTE: The data set WORK.CLASS2 has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.06 seconds
8 ;
9 run;
10
11 proc sql;
12 create table class_e as
13 select *
14 from sashelp.class as a
15 inner join class2 as b
16 on a.name = b.name;
WARNING: Variable name already exists on file WORK.CLASS_E.
NOTE: Table WORK.CLASS_E created, with 2 rows and 6 columns.
17 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.06 seconds
cpu time 0.01 seconds
18
19 proc print data=class_e;
20 run;
NOTE: There were 2 observations read from the data set WORK.CLASS_E.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.05 seconds
cpu time 0.04 seconds
21
Obs Name Sex Age Height Weight score
1 Alfred M 14 69.0 112.5 85
2 Alice F 13 56.5 84.0 89
9.2.19.5 Minus join (anti-join)
2 data class2;
3 input name $ score;
4 datalines;
NOTE: The data set WORK.CLASS2 has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
8 ;
9 run;
10
11 proc sql;
12 create table class_e as
13 select *
14 from sashelp.class as a
15 left join class2 as b
16 on a.name = b.name
17 where b.name is NULL;
WARNING: Variable name already exists on file WORK.CLASS_E.
NOTE: Table WORK.CLASS_E created, with 17 rows and 6 columns.
18 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
19
20 proc print data=class_e;
21 run;
NOTE: There were 17 observations read from the data set WORK.CLASS_E.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
Obs Name Sex Age Height Weight score
1 Barbara F 13 65.3 98.0 .
2 Carol F 14 62.8 102.5 .
3 Henry M 14 63.5 102.5 .
4 James M 12 57.3 83.0 .
5 Jane F 12 59.8 84.5 .
6 Janet F 15 62.5 112.5 .
7 Jeffrey M 13 62.5 84.0 .
8 John M 12 59.0 99.5 .
9 Joyce F 11 51.3 50.5 .
10 Judy F 14 64.3 90.0 .
11 Louise F 12 56.3 77.0 .
12 Mary F 15 66.5 112.0 .
13 Philip M 16 72.0 150.0 .
14 Robert M 12 64.8 128.0 .
15 Ronald M 15 67.0 133.0 .
16 Thomas M 11 57.5 85.0 .
17 William M 15 66.5 112.0 .
This is the SAS SQL equivalent of “keep rows in A that do not match B”.
9.2.20 Create Table 1
In clinical reporting, “Table 1” typically summarizes baseline demographics and characteristics by treatment group.
Your code calls external macro files via %include.
This is a realistic production pattern: statistical programming teams maintain shared macro libraries, and analyses import them.
2 *** Load utility macros;
3 %include
3 ! "C:\Users\hed2\Downloads\mybook2\mybook2\sasmacro\create_table1.
3 ! sas";
WARNING: Physical file does not exist,
C:\Users\hed2\Downloads\mybook2\mybook2\sasmacro\create_table1.sas
.
ERROR: Cannot open %INCLUDE file
C:\Users\hed2\Downloads\mybook2\mybook2\sasmacro\create_table1.sas.
4
5 *** Specify input and output data sets, and the column
5 ! variable.;
6 %let INPUT_DATA = sashelp.class;
7 %let OUTPUT_DATA = Table1 ;
8 %let COLVAR = sex;
9
10 /*chort*/
11 *** %AddText(text=Height);
12 *** %CategoricalRowVar2(rowVar=);
13 %ContinuousRowVar2(rowVar=height );
_
180
WARNING: Apparent invocation of macro CONTINUOUSROWVAR2 not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.
WARNING: Apparent invocation of macro CONTINUOUSROWVAR2 not resolved.
14 *** %AddText(text=);
15
16 %ContinuousRowVar2(rowVar=weight );
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
WARNING: Apparent invocation of macro CONTINUOUSROWVAR2 not resolved.
17 %ContinuousRowVar2(rowVar=Age );
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
18
19 proc print data= table1;
ERROR: File WORK.TABLE1.DATA does not exist.
20 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements.
This might cause NOTE: No observations in data set.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
21
22 /* Export Table 1 as a CSV file*/
NOTE: PROCEDURE EXPORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The SAS System stopped processing this step because of errors.
23 proc export data=Table1 replace label
24 outfile="C:\Users\hed2\Downloads\mybook2\mybook2\sasmacro\ta
24 ! ble1.csv"
25 dbms=csv;
26 run;
27
ERROR: Errors printed on page 1.
Interpretation:
- Macro calls like %ContinuousRowVar2(...) likely append rows into the Table 1 dataset.
- proc export creates a portable deliverable (CSV) for review, QA, or publication workflows.
9.3 Using macros
Macros are the “automation layer” of SAS. They help you: - avoid copy/paste, - standardize reports, - scale analyses to many endpoints/variables.
9.3.0.1 Create a macro variable from a statistic
2 proc sql noprint;
3 select std(age) format=best32.
4 into :age_mean
5 from sashelp.class;
6 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
7
8 %put Mean of age: &age_mean;
Mean of age: 1.49267215939689
9 %putlog Mean of age: &age_mean;
_
180
WARNING: Apparent invocation of macro PUTLOG not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.
ERROR: Errors printed on page 1.
This is a standard trick: - compute something with PROC SQL, - store into a macro variable, - reuse it later in titles, footnotes, conditions, or other macros.
9.3.0.2 Select variable names from metadata
SAS metadata tables (dictionary.columns) are extremely powerful for programmatic workflows.
proc sql;
select name
into :vars separated by ' '
from dictionary.columns
where libname="SASHELP" and
memname="CLASS" and varnum=4;
;
quit;
%put &vars.;
Column Name
--------------------------------
Height
This pattern is the foundation of “dynamic programming” in SAS: - detect variables automatically, - loop over them, - generate consistent outputs.
9.3.0.3 Iterative DO loops
DATA step loops are for row-by-row computations and simulation-like constructions.
data do_to;
x=10;
y="yes";
do i=1 to 10;
x=x+1;
output;
end;
run;
proc print data=do_to;
run;
Obs x y i
1 11 yes 1
2 12 yes 2
3 13 yes 3
4 14 yes 4
5 15 yes 5
6 16 yes 6
7 17 yes 7
8 18 yes 8
9 19 yes 9
10 20 yes 10
do while loops are useful for process-like simulations:
data loan;
balance=1000;
payment=0;
do while (balance>0);
balance=balance-100;
payment=payment+1;
output;
end;
run;
proc print data=loan;
run;
Obs balance payment
1 900 1
2 800 2
3 700 3
4 600 4
5 500 5
6 400 6
7 300 7
8 200 8
9 100 9
10 0 10
9.3.0.4 DO loop inside a macro
This example illustrates how macros generate SAS code and then run it.
%macro run_calculation(amt, t, r);
data customer_value;
i=&r./10.;
do n=0 to &t.;
S=&amt.*((1+i)*n - 1)/i; /*Power operator */
output; /*output s*/
end;
file print;
putlog s @@;
put s @@;
run;
proc print data=customer_value;
run;
%mend;
%run_calculation(amt=100, t=10, r=7);
2285.7142857
Obs i n S
1 0.7 0 -142.86
2 0.7 1 100.00
3 0.7 2 342.86
4 0.7 3 585.71
5 0.7 4 828.57
6 0.7 5 1071.43
7 0.7 6 1314.29
8 0.7 7 1557.14
9 0.7 8 1800.00
10 0.7 9 2042.86
11 0.7 10 2285.71
Even when the formula is toy-like, the structure mirrors production macro design: - macro parameters define the scenario, - data step creates a dataset, - output is printed or reported.
9.3.1 PROC REPORT for flexible summaries
Here we generate a summary dataset with proc means, then format it with proc report.
proc means data=sashelp.class;
var _NUMERIC_; /*_CHARACTER_*/
output out=want mean= sum= max= /autoname;
run;
proc report data= want;
Column _FREQ_;
Column Age_Mean;
Column Age_Sum;
Column Age_max;
Define _FREQ_ /"The total number" display;
Define Age_mean /"Mean of age" display;
Define Age_sum /"Sum of age" display;
Define Age_max /"Max of age" display;
Run;
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------
Age 19 13.3157895 1.4926722 11.0000000 16.0000000
Height 19 62.3368421 5.1270752 51.3000000 72.0000000
Weight 19 100.0263158 22.7739335 50.5000000 150.0000000
-------------------------------------------------------------------------
The
total Mean of Sum of Max of
number age age age
19 13.315789 253 16
Teaching point:
- proc means creates the statistics,
- proc report controls presentation.
This split is clean and scalable.
9.4 Use SAS formats
Formats provide consistent labeling, grouping, and display standards—especially important in regulated reporting.
9.4.0.1 Creating labels for different values
PROC FORMAT;
VALUE LIKERT7_A
1,2,3 = "Disagree"
4 = "Neither Agree nor Disagree"
5,6,7 = "Agree"
RUN;
PROC FORMAT;
VALUE INCOME
LOW -< 20000 = "Low"
20000 -< 60000 = "Middle"
60000 - HIGH = "High";
RUN;
PROC FORMAT;
VALUE RACE
1 = "White"
2 = "Black"
OTHER = "Other";
RUN;
PROC FORMAT;
VALUE $GENDERLABEL
"M" = "Male"
"F" = "Female";
RUN;
DATA sample;
SET sashelp.class;
FORMAT sex GENDERLABEL. ;
RUN;
proc freq data=sample;
table sex;
run;
The FREQ Procedure
Cumulative Cumulative
Sex Frequency Percent Frequency Percent
-----------------------------------------------------------
Female 9 47.37 9 47.37
Male 10 52.63 19 100.00
Formats are not only cosmetic—they also define category behavior in many procedures.
9.4.1 Storing formats to a library
When you create formats in work, they disappear after the session ends.
A permanent format library allows reuse across projects and reports.
9.4.1.1 Way 1: store in a dedicated format library and use FMTSEARCH
2 LIBNAME format
2 ! "C:\Users\hed2\Downloads\code-storage\code\format";
NOTE: Library FORMAT does not exist.
3
4 PROC FORMAT LIBRARY=format ;
ERROR: Library FORMAT does not exist.
5 VALUE $GENDER
6 "M" = "Male_new"
7 "F" = "Female_new";
NOTE: The previous statement has been deleted.
8 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The SAS System stopped processing this step because of errors.
9
10 OPTIONS FMTSEARCH=(format);
11 DATA sample;
12 SET sashelp.class;
13 FORMAT sex GENDER. ;
_______
48
ERROR 48-59: The format $GENDER was not found or could not be loaded.
14 RUN;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements.
This might cause NOTE: No observations in data set.
WARNING: The data set WORK.SAMPLE may be incomplete. When this step was
stopped there were 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
15
16 proc freq data=sample;
17 table sex;
18 run;
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
ERROR: Errors printed on page 1.
Interpretation:
- LIBNAME format points to a permanent folder.
- PROC FORMAT LIBRARY=format stores formats there.
- FMTSEARCH tells SAS where to look for formats.
9.4.1.2 Way 2: include a SAS file that defines formats
2 %INCLUDE
2 ! 'C:\Users\hed2\Downloads\code-storage\code\format\format_test.sa
2 ! s';
WARNING: Physical file does not exist,
C:\Users\hed2\Downloads\code-storage\code\format\format_test.sas.
ERROR: Cannot open %INCLUDE file
C:\Users\hed2\Downloads\code-storage\code\format\format_test.sas.
3 DATA sample;
4 SET sashelp.class;
5 FORMAT sex GENDERsex. ;
__________
48
ERROR 48-59: The format $GENDERSEX was not found or could not be loaded.
6 RUN;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements.
This might cause NOTE: No observations in data set.
WARNING: The data set WORK.SAMPLE may be incomplete. When this step was
stopped there were 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
7
8 proc freq data=sample;
9 table sex;
10 run;
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
ERROR: Errors printed on page 1.
This is common in shared programming environments:
- formats are maintained in version-controlled .sas files,
- projects include them as needed.
9.4.1.3 Way 3: add formats to an existing format library
2 LIBNAME format
2 ! "C:\Users\hed2\Downloads\code-storage\code\format";
NOTE: Library FORMAT does not exist.
3
4 PROC FORMAT LIBRARY=format ;
ERROR: Library FORMAT does not exist.
5 VALUE $GENDER
6 "M" = "Male_new"
7 "F" = "Female_new";
NOTE: The previous statement has been deleted.
8 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The SAS System stopped processing this step because of errors.
9
10 PROC FORMAT LIBRARY=format ;
ERROR: Library FORMAT does not exist.
11 VALUE $gendersextwo
12 "M" = "Male_new2_two"
13 "F" = "Female_new2_two";
NOTE: The previous statement has been deleted.
14 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The SAS System stopped processing this step because of errors.
15
16 OPTIONS FMTSEARCH=(format);
17 DATA sample;
18 SET sashelp.class;
19 FORMAT sex GENDERsextwo. ;
_____________
48
ERROR 48-59: The format $GENDERSEXTWO was not found or could not be loaded.
20 RUN;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements.
This might cause NOTE: No observations in data set.
WARNING: The data set WORK.SAMPLE may be incomplete. When this step was
stopped there were 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
21
22 proc freq data=sample;
23 table sex;
24 run;
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
ERROR: Errors printed on page 1.
This demonstrates that format libraries are extendable: you can keep adding definitions over time.
9.4.1.4 Way 4: apply formats in batch via %include
2 LIBNAME format
2 ! "C:\Users\hed2\Downloads\code-storage\code\format";
NOTE: Library FORMAT does not exist.
3
4 PROC FORMAT LIBRARY=format ;
ERROR: Library FORMAT does not exist.
5 VALUE $GENDER
6 "M" = "Male_new"
7 "F" = "Female_new";
NOTE: The previous statement has been deleted.
8 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The SAS System stopped processing this step because of errors.
9
10 OPTIONS FMTSEARCH=(format);
11 DATA sample;
12 SET sashelp.class;
13 %INCLUDE
13 ! 'C:\Users\hed2\Downloads\code-storage\code\format\use_format.sas
13 ! ';
WARNING: Physical file does not exist,
C:\Users\hed2\Downloads\code-storage\code\format\use_format.sas.
ERROR: Cannot open %INCLUDE file
C:\Users\hed2\Downloads\code-storage\code\format\use_format.sas.
14 RUN;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.SAMPLE has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
15
16 proc freq data=sample;
17 table sex;
18 run;
NOTE: There were 19 observations read from the data set WORK.SAMPLE.
NOTE: The PROCEDURE FREQ printed page 1.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.06 seconds
cpu time 0.00 seconds
ERROR: Errors printed on page 1.
The FREQ Procedure
Cumulative Cumulative
Sex Frequency Percent Frequency Percent
--------------------------------------------------------
F 9 47.37 9 47.37
M 10 52.63 19 100.00
This pattern is helpful when: - many variables require formatting, - format assignment rules are long, - you want to keep the main analysis code clean.
9.4.2 Modify formats
User-defined formats cannot be edited “in place” like a database table; you typically redefine them.
Your example also shows using other= to handle all remaining values.
/*User-defined formats cannot be edited. to create a macro by myself ; or proc freq then use excel*/;
PROC FORMAT ;
VALUE $GENDER
"M" = "Male_new"
"F" = "Female_new";
RUN;
proc format;
value $sex_f "F"="Female_f" other=[5.1];
run;
DATA sample;
SET sashelp.class;
FORMAT sex $sex_f. ;
RUN;
proc freq data=sample;
table sex;
run;
The FREQ Procedure
Cumulative Cumulative
Sex Frequency Percent Frequency Percent
-------------------------------------------------------------
Female_f 9 47.37 9 47.37
M 10 52.63 19 100.00
9.4.3 Transform missing to character then to numeric again
This technique appears in recoding tasks where you need numeric categories but start from character labels.
PROC FORMAT ;
VALUE $num
"M" = 1
"F" = 2;
RUN;
DATA sample_2;
SET sashelp.class;
sex_2 = input(put(sex ,num.),best.); /*Converting variable types and keep*/
RUN;
proc freq data=sample_2;
table sex_2;
run;
The FREQ Procedure
Cumulative Cumulative
sex_2 Frequency Percent Frequency Percent
----------------------------------------------------------
1 10 52.63 10 52.63
2 9 47.37 19 100.00
Mechanism:
- put(sex, num.) maps “M”/“F” to “1”/“2” as character,
- input(..., best.) converts that character into numeric.
9.4.4 Output format definition details
When you need to inspect what a format contains, you can output it.
PROC FORMAT ;
VALUE $GENDER
"M" = "Male_new"
"F" = "Female_new";
RUN;
proc format library = work.formats FMTLIB cntlout = cntlout;
select $GENDER;
run;
---------------------------------------------------------------------------
-
| FORMAT NAME: $GENDER LENGTH: 10 NUMBER OF VALUES: 2
|
| MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH: 10 FUZZ: 0
|
|--------------------------------------------------------------------------
|
|START |END |LABEL (VER. V7|V8
20FEB2026:16:45:16)|
|----------------+----------------+----------------------------------------
|
|F |F |Female_new
|
|M |M |Male_new
|
---------------------------------------------------------------------------
-
This creates cntlout, which is a dataset representation of the format—useful for QA and documentation.
9.4.5 A macro to copy an existing PROC FORMAT
This longer macro demonstrates a common real need: reconstruct a format definition programmatically (for reuse, reporting, or migration).
%macro formatd(data, var);
data temp;
set &data;
keep &var;
run;
/*original freq*/
ods output OneWayFreqs=sample_b ;
proc freq data=temp ;
table &var /missing ;
run;
ods output close;
/*delete format freq*/
data sample2;
set temp;
Format _all_;
run;
proc freq data=sample2 ;
table &var /missing out=sample_c ;
run;
/*select original variable code*/
proc sql noprint;
select name into :firstvar_b from dictionary.columns where libname='WORK' and memname='SAMPLE_B'
and varnum=2;
quit;
data sample_b2;
set sample_b;
Keep &firstvar_b ;
run;
/*select original variable label*/
proc sql noprint;
select name into :firstvar_c from dictionary.columns where libname='WORK' and memname='SAMPLE_C'
and varnum=1;
quit;
data sample_c2;
set sample_c;
Keep &firstvar_c ;
run;
/*merge variable code and label*/
data sample_bc;
set sample_b2 (RENAME=(&firstvar_b=new_b));
set sample_c2 (RENAME=(&firstvar_c=new_c));
run;
/*create format format and output*/
data sample_bc;
set sample_bc;
Original_format = CATS(new_c,"=","'",new_b,"'");
run;
proc print data=sample_bc noobs;
var Original_format;
run;
%mend formatd;
PROC FORMAT ;
VALUE $GENDER
"M" = "Male_new"
"F" = "Female_new";
RUN;
DATA sample;
SET sashelp.class;
FORMAT sex $GENDER. ;
RUN;
%formatd(sample, sex );
The FREQ Procedure
Cumulative Cumulative
Sex Frequency Percent Frequency Percent
---------------------------------------------------------------
Female_new 9 47.37 9 47.37
Male_new 10 52.63 19 100.00
The FREQ Procedure
Cumulative Cumulative
Sex Frequency Percent Frequency Percent
--------------------------------------------------------
F 9 47.37 9 47.37
M 10 52.63 19 100.00
Original_
format
F='Female_new'
M='Male_new'
Reading the intent:
- sample_b captures formatted labels (via ODS output from PROC FREQ).
- sample_c captures raw unformatted codes.
- The macro merges them to reconstruct mapping rules like "Male_new"='M'.
This is a clever QA/automation trick when you inherit legacy formats but need documentation.
9.4.6 A macro to view the list of variables
Finally, a compact utility macro to list variable names in a dataset.
%macro varnames (dat);
proc contents
data = &dat
noprint
out = data_info
(keep = name );
run;
proc print
data = data_info
noobs;
run;
%mend;
%varnames (sashelp.class)
NAME
Age
Height
Name
Sex
Weight
This is particularly helpful inside SASmarkdown documents because it provides quick visibility when you cannot “click around” like in SAS Studio.
9.4.7 Practical workflow recommendations for SASmarkdown
When you write SAS inside R Markdown, the biggest productivity gains come from being explicit:
- Make each SAS chunk self-contained, or write intermediate datasets to a permanent library.
- Use ODS OUTPUT whenever you need a procedure result as a dataset for downstream steps.
- Prefer PROC SQL joins for clarity unless you specifically need DATA step merge behavior.
- Store formats in a permanent library if the report will be knitted repeatedly.
With these habits, SASmarkdown becomes a reliable way to produce reproducible, publication-ready analysis documents that combine narrative, code, and output in one place.