Abstract: This is a brief guide to the essentials you need to know to write a SAS program. See the two page handout, SAS on the Cunix Cluster, or the SAS companions for your particular operating system for instructions on running a SAS program. On Windows, select all and click on the running-man icon.
A SAS Program consists of a series of SAS statements which are used to define, read in, manipulate, and analyze data. The typical SAS program is organized into three parts:
|1. Data Definition and Options||Define the data location and the environment.|
|2. Data Step||Read, modify, subset, and write the data|
|3. Procedure(s)||Perform an action on the data, e.g. sort the data, compute means, run a regression, etc.|
An example of a complete program is at the end.
The data step takes most of the time, so plan accordingly. This document will review each of the three parts in some detail, giving examples of each. It will also provide a few tips in a section on Additional Information, and will show a complete sample program at the end. The examples included here will all point to unix directory naming structures. If you are working on another operating system, only the directory naming conventions will differ.
Since SAS statements are the basis of all three parts of a SAS program, there are a few generalities which must be mentioned about them:
- All SAS statements end with a ; (semicolon).
- Almost all SAS statements begin with a SAS keyword. e.g. data, set, proc, infile, input, title, if, options, etc. Exceptions are assignment statements, e.g. age=curryr-birthyr.
- All SAS statements are free format, i.e. they can begin in any column and can run onto additional lines without any regard for column location. However, it is advised for clarity that you structure your program for easy reading on your screen.
- Quotes can be single or double but they must match.
The Data Definitions and Options are at the top of most SAS programs. The first thing you need to do is to tell SAS where the data is. In other words, you must define the location of the data on your computer or storage device. To define the location of the data in SAS, you need to know:
- The type of data you are working with, i.e. is it raw data or a SAS data set?
- If you are reading in raw data, you need to know the length, or lrecl, of the records in the file.
- Where the data is. For example:
survey1.dat Raw data in your directory on cunix survey1.sas7bdat A SAS data set in your directory on cunix C:\Documents and Setting\User\My Documents\survey1.dat A raw dataset on Windows C:\Documents and Setting\User\My Documents\survey1.sas7bdat A SAS dataset on Windows
For reading in raw data, you need a filename statement to identify its location on your computer. On cunix, For example,
filename rawin '/p/s/sz/sas/survey1.dat' lrecl=1880;
This statement assigns a ddname (data definition name), rawin, to associate with the raw data file survey1.dat which is located in the unix directory /p/s/sz/sas/ and has a record length of 1880. (Hint: If you are unsure of the directory name, give the unix command "pwd" to find out.)
For reading in a SAS data set, you also need a libname statement. For example:
libname sasdata '/p/s/sz/sas/';
|Note: filename statements point to specific files while libname statements simply point to directories.|
An options statement is used to define an environment for the program. It changes the standard settings. Some common options include:
- ls – defines the line size for output.
- obs – limits the number of observations processed to allow for program testing on a small subset rather than reading in the entire data set.
- nocenter – writes all the output in the log and listing files flush left.
A statement with all these options defined would appear as follows:options ls=78 obs=5 nocenter;
A full list of options is available in the SAS Language guide.
The data step portion of a SAS program creates a SAS data set, either permanent or temporary, from raw data or from another SAS data set. SAS procedures may only be run on SAS data sets. Therefore all data must be converted to that format in order to run any analyses.
A temporary SAS data set is one that is created in the program and automatically erased when the proram is finished. This is often useful while you are in the testing stages of your analysis. Creating a permanent SAS data set is useful once you have gotten your data in the form you need it in and you expect to be working with the data set repeatedly.
This section will review the following facets of the data step:
- Reading Raw Data
- Reading a SAS Data Set
- Selecting and Modifying the Data
- Saving a Permanent SAS dataset
- Example of the SAS statements in a complete data step
If you are reading raw data you need:
- A filename statement pointing to the location of the raw data.
- A list of the variables you want.
- Each variable’s column position(s) in the file, e.g. 1-3.
- Each variable’s type, e.g., numeric or alphanumeric.
- A data statement followed by a name you will assign to the SAS file.
- An infile statement pointing to the raw data defined in the filename statement.
- An input statement followed by the list of variables, their positions, and type.
- Assignment statements subsetting the data or creating or modifying variables (optional).
- A run statement signaling the end of the data step.
Example:filename rawin '/eds/datasets/userfiles/temp01/data/hh.dat' lrecl=2341; data one; infile rawin; input idnum 1-4 age 7-9 state $ 17-18 sex 55-55; run;
In the data step above, a temporary SAS data set called one is created. This will be referred to in the SASLOG as work.one. This data set can be used in subsequent data steps and procedures within the program. Because it is temporary, this file will be erased automatically when the program completes its run.
Note: In the input statement, $ is used to indicate alphanumeric variables as in state. Since alphanumeric, or character, format is a superset of numeric format, all variables may be read in in character format. However, only numeric variables may be used in any analysis such as regressions. So even if all of the values are numbers, if a variable is defined as character, you cannot use it for analysis.
If you have a SAS data set, either your own or one you’ve received, you can either run SAS procedures directly on it, or you can read it in and make some changes using the data step. See the Procedures section for an example on running analysis directly on a SAS data set.
If you need to revise the data, e.g. take a subset, create a new variable, etc., you must first read the data set in.
To do this, you will need:
- A libname statement defining the location of the SAS data set.
- A data statement followed by a name you will assign to the New SAS data set.
- A set statement followed by the name of the SAS data set.e.g., libname sasdata 'C:\Documents and Setting\User\My Documents\'; data one; set sasdata.survey1;
Note: The set statement refers to a file called survey1.sas7bdat in the Windows directory ddname you use in the libname statement acts as a placeholder to be used in the set statement. It is not part of the file’s name. In your directory, the extension will always be .sas7bdat, regardless of the ddname you use for the libname.
As in the previous example, a temporary SAS data set called one (or work.one) is created. It is available to be used in subsequent data steps and procedures within the program. This data set will automatically be erased when the program completes its run since it is temporary.
It is important to realize that any changes you make in the data step will only affect the temporary data set. The original data set survey1.sas7bdat will remain unchanged.
If you need to do select or modify data, it must be done within the data step. Some common SAS statements for this are if statements, assignment statements, and where statements.
If statements – This command selects whole observations (cases), usually people, and performs an action on those observations, e.g.:
if sex = 1; Keep only those observations where sex=1 if state = ‘JN’ then state=’NJ’; Fixes any observations with value for state of JN to be changed to NJ. if racegrp in(4,5,6,8); Keeps observations in any of race groups 4, 5, 6 or 8.
The values for variables in SAS statements must be quoted if the variable is a character variable and must be unquoted if it is numeric. If you are unsure whether a variable is character or numeric, you should run proc contents on the data set.
You may also delete specific observations using the if statement:
if racegrp=5 then delete;
Warning! The effect of multiple if statements is cumulative.
Assignment Statements – creating new variables or changing the values of an existing one. Examples of creating a new variable are:newage=0; yearetr=byear+65; income=salary+interest+divdnds;
In changing the values of existing variables, it is best to do this on a new variable created from an old one, so that you don’t lose the original values. e.g.if 0 le age le 18 then agegrp=1; else if 18 lt age lt 65 then agegrp=2; else agegrp=3;
Where statements – used to subset observations. e.g.data one; set sasdata.survey1; keep sex age race income; where sex=1; run;
This will create a work data set which subsets sasdata.survey1 to keep only those observations for which sex is equal to 1. The keep statement tells SAS to only keep those variables in the data set one.
A where statement is the only data modification statement that can be used in procedure statements as well as in the data step., e.g.proc freq data=sasdata.survey1; where sex=1; run;
does not write out a dataset, but runs the analysis on those records for which sex is equal to 1.
If you would like a SAS data set to be saved for use beyond the program in which you create it, you must create a permanent SAS data set. The steps are:
- Decide where you want to save the file and put in a libname statement pointing to that directory.
- Use a two-part name in the data step, with the ddname from the libname statement being the first part of the name.
e.g.filename rawin 'C:\Documents and Setting\User\My Documents\survey1.dat' lrecl=1880; libname sasdata 'C:\Documents and Setting\User\My Documents\'; data sasdata.survey1 infile rawin; input idnum 1-4 age 7-9 etc.
This will create a file in the subdirectory C:\Documents and Setting\User\My Documents\ within the user’s home directory. It will be written to disk under the name survey1.sas7bdat.
– A complete step with data modifications would then look something like this:data one; set sasdata.survey1; where sex=1; yearetr=byear+65; income=salary+interest+divdnds; agegrp=age; if 0 le age le 18 then agegrp=1; else if 18 lt age lt 65 then agegrp=2; else agegrp=3; run;
SAS procedures are used to perform an action on the data. This includes running any sort of statistical analyses, including chi-squares, regressions, means, frequencies, and plots, as well as just sorting or looking at your data (e.g. by running proc print). To use SAS Procedures:
- Decide what statistical procedures are appropriate for your research. You and your advisor/statistician have to do this. EDS does not provide statistical consulting.
- Very Important!!! First check the data using simple procedures such as proc freq, proc means, and proc print. You need to run frequencies on all the variables you are going to use in your analysis so you know what your data looks like. If you are reading in an existing SAS data set, proc contents will give you information on variable names and types.
- Look up the particular procedure command in the manual and choose the subcommands and options you need.
Procedure statements follow the SAS statement form in that they begin with the keyword proc followed by the procedure name, any subcommands, and the relevant options, e.g.
proc freq data=one; tables sex age; run;
You can also run a procedure directly on a permanent SAS data set. So if you don’t need to subset your data or make any variable edits, you don’t even need to include the data step in the program, e.g.
libname sasdata '/p/s/sz/sas/data'; proc freq data=sasdata.survey1; tables sex age state; run;
Some other nice (but optional) commands:
- TITLE – Puts a Title line on your output. (Also TITLE2-TITLE10 for additional lines of title)
- * – Comments out a line in your program – Much recommended.
- /* followed by */ – Another way of commenting. Everything in-between is commented out including semicolons.
Example of a Complete Program:
/* Read in poll data, select out all females and save the file as a permanent SAS data set. */ options ls=78 nocenter; filename rawin 'C:\Documents and Setting\User\My Documents\survey1.dat' lrecl=1880; libname sasdata 'C:\Documents and Setting\User\My Documents\'; data sasdata.survey1 infile rawin; input idnum 1-4 age 7-9 state $ 17-18 sex 55-55; if sex=1; if age lt 18 then agegrp=1; else if 18 le age lt 65 then agegrp=2; else if age ge 65 then agegrp=3; proc print data=sasdata.survey1 (obs=10); title 'First 10 records'; proc freq; tables state; where agegrp=2; title 'Frequencies on Females'; title2 'Between the Ages of 18 and 65'; run;