



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An advantage of PROC REPORT over PROC TABULATE is its ability to define columns and manipulate the rows of a report. One important report used by statisticians ...
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!
PROC REPORT is a tool for tabulating and reporting the contents of SAS® data sets. PROC REPORT’s strength is its ability to create table columns, modify individual cell values, and use information from different rows of the report. This flexibility allows users to create tables that show cumulative totals and row differences. Cumulative totals are used to report values such as the number of responses year-to-date, percent of a total for all periods to date, and cumulative distribution functions. Row differences show changes between row. Two PROC REPORT features that allow this are the COMPUTE block, and the distinction that PROC REPORT makes between report and data variables. This paper discusses how PROC REPORT builds a report, COMPUTE blocks, and what report and data variables are and how they work. The code to produce one and two level tables containing both cumulative row totals and row differences are shown with explanatory comments.
Key word: PROC REPORT, Cumulative Total, Cumulative Percent, Row Difference
PROC REPORT is an excellent tool for preparing standard reports. Not all reports however fit the standard. An advantage of PROC REPORT over PROC TABULATE is its ability to define columns and manipulate the rows of a report. One important report used by statisticians is a table of cumulative totals or percent. Cumulative totals are needed when displaying both the number of responses to a survey and the cumulative responses to date by month. Often it is also important to show the change from one row to the next.
This paper discusses the following PROC REPORT features: (1) the difference between a report variable and a data variable, (2) how PROC REPORT builds a report, and (3) the use of the compute statement. PROC REPORT is flexible enough to allow the calculation a cumulative total and row change. A complete two level table showing cumulative totals and row differences is given in Table 1. The code for this final table is illustrated by starting with a simple table and progressively adding more complexity. For each table the code is given and annotated to explain what is being done and why. The code to generate the artificial data used in the example is given in the Appendix. The purpose of this paper is to show how to produce Table 1.
Table 1 is the final table. This PROC REPORT table contains both cumulative totals and row differences. The first four columns of this tables are standard PROC REPORT columns. Columns (5) and (6) show the absolute and relative difference between two row totals. Columns (7) and (8) show the cumulative total and percent associated with row totals. Each of these columns is discussed in detail in later sections. Using Table 1 you can determine a quarter’s current sales, the change in sales from the previous quarter, and the total sales for the year to date. The SAS code to produce this table is given later.
PROC REPORT builds a table in a two step process. It first constructs a temporary data set. This temporary data set is then used along with compute blocks and data variables to create and display the table. A detailed description of the process is given in
“The PROC REPORT Procedure”, Chapter 32 of Version 8, SAS Procedures Guide.”
TABLE 1: FINAL TABLE _SALES QUARTER CHANGE YEAR-TO-DATE YEAR QTR Total Percent TOTAL PERCENT YTD PERCENT (1) (2) (3) (4) (5) (6) (7) (8)
97 1 $1,658 11% (na) (na) $1,658 11% 2 $4,068 27% $2,410 145% $5,726 38% 3 $1,311 9% $-2757 (68%) $7,037 47% 4 $8,038 53% $6,727 513% $15,075 100%
97 $15075 100% (na) (na).. ----- ------ --— ------ ----- ------ ----
98 1 $1,534 10% (na) (na) $1,534 10% 2 $4,321 28% $2,787 182% $5,855 38% 3 $1,157 7% $-3164 (73%) $7,012 45% 4 $8,532 55% $7,375 637% $15,544 100%
98 $15544 100% (na) (na)..
99 1 $1,600 11% (na) (na) $1,600 11% 2 $3,815 26% $2,215 138% $5,415 37% 3 $1,218 8% $-2597 (68%) $6,633 45% 4 $8,079 55% $6,861 563% $14,712 100%
99 $14712 100% (na) (na)..
Creating The Temporary Data Set PROC REPORT summarizes the input data set for group, order, and across variables. As it does this, it also calculates statistics needed in both summary and detail table lines. It creates the records associated with RBREAK, BREAK, and COMPUTE BEFORE | AFTER statements for report variables. A record can be written to the temporary data set for lines not published in the report. Creation of a preliminary data set gives the PROC REPORT access to group totals before the first line of the report or a group is printed. The important thing to realize is that the temporary data set has one record for each line in the displayed table. Depending on the PROC REPORT code used, it can also have additional records that are used but not displayed. Creation of tables with a cumulative total or row difference depends on using records in the temporary data set that are not displayed.
Using The Temporary Data Set To Build The Report Once a temporary data set is created, PROC REPORT creates the report line by line. It initializes all data step variables to missing and then sequentially constructs each row of the report. All report variables are initialized to missing at the beginning of each line of the report. The value of each report variable is determine from left-to-right; values for computed variables come from executing statements in the compute block. Only report variables to the left of a report variable being computed can be used in the calculation. All other values are obtained from the temporary file created at the start. A key feature of PROC REPORT is using data variables and compute blocks.
Data Variables Versus Report Variables A PROC REPORT variable is either a data variable or a report variable. Data variables do not appear in the COLUMN statement; they appear only in a COMPUTE block. Data variables are set to missing initially and remain missing until specifically assigned a value. Once assigned a value, they keep that value until it is changed. Data variables hold their value between COMPUTE blocks for the same report line and between different report lines. Data variables behave like data step
variables when a retain statement is used. Report variables appear in a COLUMN statement. They appear in the report unless a NOPRINT option is used in the DEFINE statement. Report variable appear in the report in the order they occur on the COLUMN statement. There are two varieties of report variables: (1) variables that come from the input data set (e.g. Sales) and (2) COMPUTED variables that are assigned values in a COMPUTE block (e.g. YTD). A report variable may or may not appear in a compute block. Report variables are initialized to missing at the beginning of each row of the report; this is similar to how data set variables are handled in the data step with the program data vector. Computed variables are created from left to right as they appear on the COLUMN statement.
Compute Blocks A compute block is one or more programming statements that appear between the COMPUTE and ENDCOMP statements. A COMPUTE block can be associated with either a report item (data set variable, statistic, or computed variable), a location (before or after a group of observations, or at the top or bottom of a page or of the complete report). A compute block with a report variable can be used either to define the computed variable (a variable not on the input data set), to modify a report variable, or to change the display characteristics of a report variable. An example of a compute block to create a new variable is given below.
COMPUTE PROFIT; PROFIT = REVENUE.SUM - EXPENSES.SUM ENDCOMP;
The final table starts with a simple Table 2 and adds different levels and columns to become more complex. This basic table is column two, three and four of a single level of Table 1.
The Basic Table Table 2 is the basis for Table 1.
Table 2: 1998 SALES BY QUARTER ______SALES_______ QUARTER TOTAL PERCENT (1) (2) (3) 1 $1,534 10% 2 $4,321 28% 3 $1,157 7% 4 $8,532 55%
$15,544 100.0%
SAS Code The code to produce this table is given below.
WHERE(year= 98 ); COLUMN quarter ('SALES' sales
DEFINE quarter /GROUP FORMAT=8. QUARTER(1)'; DEFINE sales /ANALYSIS SUM FORMAT=DOLLAR8. 'TOTAL(2)'; DEFINE pct /ANALYSIS PCTSUM FORMAT=PERCENT8. 'PERCENT*(3)';
Things To Notice In Table 2 Code Several things to notice in this generic PROC REPORT code are:
headers,
total for all groups.
The basic characteristic of a cumulative total column are: (1) the cumulative total for the first row is always equal to the sum from the first row, (2) the cumulative total from the last row is always equal to the sum of all row totals, (3) the cumulative total of a row is equal to the row total plus the cumulative total of the previous row, and (4) when row totals are positive the cumulative total for any row is greater than the cumulative total of the previous row.
Simple Cumulative Total Table Table 3 is Table 2 with a cumulative total and percent column added using data variables and a COMPUTE BEFORE block.
TABLE 3: 1998 SALES BY QUARTER and YEAR TO DATE _______________SALES________________ BY QUARTER YEAR TO DATE QUARTER Total Percent TOTAL PERCENT (1) (2) (3) (4) (5) 1 $1,534 10% $1,534 10% 2 $4,321 28% $5,855 38% 3 $1,157 7% $7,012 45% 4 $8,532 55% $15,544 100%
$15,544 100% .... ....
SAS Code Table 3 is produced by modifying and adding to the code used to produce Table 2.
PROC REPORT DATA=QUARTER nowd CENTER
WHERE(year= 98 ); COLUMN quarter ('SALES' ("BY QUARTER" sales sales=pct) ('YEAR TO DATE' ytd cumpct ) ) ; DEFINE quarter / GROUP FORMAT=8. 'QUARTER(1)' CENTER; DEFINE sales / ANALYSIS FORMAT=DOLLAR8. SUM 'Total(2)'; DEFINE pct / ANALYSIS PCTSUM FORMAT=PERCENT8.0 'Percent(3)'; DEFINE ytd / COMPUTED "TOTAL(4)" FORMAT= DOLLAR8.0 ; DEFINE cumpct / COMPUTED 'PERCENT*(5)' FORMAT=percent8.0;
cumtot+sales.SUM; ytd=cumtot; IF BREAK="RBREAK" THEN ytd=. ; ENDCOMP; COMPUTE cumpct; cumpct=ytd/total; ENDCOMP; RBREAK AFTER /SUMMARIZE UL OL; RUN;
Things To Notice In Table 3 Code The key to calculating cumulative totals and percent is to save the grand total at the beginning and to calculate the cumulative total correctly. The grand total is needed to calculate the cumulative percent. To do this requires the use of data variables and a COMPUTE BEFORE block.
record on the temporary data set that contains the total for all report variables on the input data set. This is the source
BREAK AFTER year / SUMMARIZE SKIP UL OL ; RUN;
Things To Notice in Table 4 Code Calculating cumulative totals and percent separately for different groups in a PROC REPORT table is almost the same as doing it for one group. The key is to initialize the cumulative total data variables at the start of each group and not to compute a grand total. Key features in the code are:
year available before each quarter record is created.
of “sales.sum”.
The value of “cumtot” is set to zero every time there is a new value for “year”.
make “cumtot” the cumulative total for the year.
“cumtot”.
When the report line is for the yearly total, the value of BREAK will be “year” and the sum of the cumulative totals and percent for the year are set to missing because the answer is meaningless.
The cumulative percent is calculated for each quarter using the cumulative total for the year (“cumtot”) and the yearly total (“total”).
The quarterly percent is calculated by dividing the sum of “sales” for the quarter (“sales.sum”) by the total sales for the year (“total”).
This creates the output data set for Table 4. Note that Table 4 has 15 rows of data and the output data set has 18 rows of data. The three extra rows of data in the output data set were
they are used by PROC REPORT to calculate the percent and cumulative percent column
Output Data Set Four Obs year QTR sales pct ytd cumpct BREAK 1 97. 15075... year 2 97 1 1658 0.10998 1658 0. 3 97 2 4068 0.26985 5726 0. 4 97 3 1311 0.08697 7037 0. 5 97 4 8038 0.53320 15075 1. 6 97. 15075 1.00000.. year 7 98. 15544... year 8 98 1 1534 0.09869 1534 0. 9 98 2 4321 0.27799 5855 0. 10 98 3 1157 0.07443 7012 0. 11 98 4 8532 0.54889 15544 1. 12 98. 15544 1.00000.. year 13 99. 14712... year 14 99 1 1600 0.10875 1600 0. 15 99 2 3815 0.25931 5415 0. 16 99 3 1218 0.08279 6633 0. 17 99 4 8079 0.54914 14712 1. 18 99. 14712 1.00000.. year
A difference table is a summary table where values in at least one column equal a current row total minus the previous row total. The difference shows the change in the current row from the previous row. When rows are time periods, differences show either the absolute or relative change in a population from one
time period to the next. In Table 5 the difference represents the change in sales for a particular quarter. On the first row, the value is set to missing. Subsequent rows are the difference between that row and the previous row.
Simple Difference Table Table 5 is a simple difference table showing the change in sales from one quarter to the next. Column (4) shows the absolute change in the current row from the previous row. Column (5) shows the percent or relative change.
Table 5: SIMPLE DIFFERENCE TABLE ____SALES ____ QUARTER CHANGE QUARTER TOTAL PERCENT CHANGE PERCENT (1) (2) (3) (4) (5)
1 $1,534 10%.. 2 $4,321 28% $2,787 1817% 3 $1,157 7% $-3,164 ( 73%) 4 $8,532 55% $7,375 637%
$15,544 100%..
SAS Code The SAS code that produces table 5 is given below.
PROC REPORT DATA=QUARTER NOWD OUT=Five CENTER HEADSKIP HEADLINE; WHERE(YEAR= 98 ); COLUMN ("QUARTER" quarter ) ("SALES" sales sales=pct ) ("QUARTER CHANGE" diff diff_pct) ; DEFINE quarter / GROUP FORMAT= 12. '(1)' CENTER; DEFINE sales / ANALYSIS FORMAT=DOLLAR8. SUM 'TOTAL(2)'; DEFINE pct / ANALYSIS PCTSUM FORMAT=PERCENT8.1 'PERCENT(3)'; DEFINE diff / COMPUTED FORMAT= DOLLAR8. 'CHANGE(4)' ; DEFINE diff_pct / COMPUTED FORMAT=percent8. 'PERCENT(5)' ; COMPUTE diff;
IF r> 1 THEN DO; last=sales.SUM;
if BREAK NE " " THEN
COMPUTE diff_pct;
Things To Notice In Table 5 Code
summary line so it will have a missing value for the difference.
line.
summary lines created by a RBREAK statement.
Puts the value of current summary line into a data variable called “last” needed to calculate the row difference.
Saves the current row total in the data variable “py”.
Calculates the percent change between the current and prior row. The variable “diff” contains the change; “last” now contains the current value of the row..
Produces the output data set given below.
Output Data Set Five
OBS QTR SALES PCT DIFF PCTDIF BREAK 1 1 1534 0... 2 2 4321 0.27799 2787 1. 3 3 1157 0.07443 -3164 -0. 4 4 8532 0.54889 7375 6.
Like the two level table with cumulative totals, the two level difference table is an extension of the one level table plus some straightforward changes. In Table 6, the simple difference table could have been used with a BY statement to create the table.
Two Level Difference Table The two level difference table is given in Table 6. It is just like Table 5 except there is a separate set of rows for each of the three years.
TABLE 6: Two Level Difference Table
____SALES_____ QUARTERLY CHANGE QUARTER TOTAL PERCENT TOTAL PERCENT YEAR (1) (2) (3) (4) (5)
97 1 $1,693 11%.. 2 $4,140 27% $2,448 145% 3 $1,413 9% $-2,728 ( 66%) 4 $8,167 53% $6,754 478%
97 $15,413 100%..
98 1 $1,560 10%.. 2 $4,405 28% $2,845 182% 3 $1,264 8% $-3,141 ( 71%) 4 $8,669 55% $7,405 586%
98 $15,897 100%..
99 1 $1,631 11%.. 2 $3,879 26% $2,248 138% 3 $1,317 9% $-2,561 ( 66%) 4 $8,201 55% $6,883 522%
99 $15,028 100%..
SAS Code The SAS code used to produce Table 6 is similar to the code use to produce Table 5.
PROC REPORT DATA=QUARTER NOWD OUT=Six SPLIT="" CENTER HEADSKIP HEADLINE; COLUMN ( year quarter ) ("SALES" sales sales=pct) ("CHANGE FROM PREVIOUS QUARTER_" diff diff_pct) ; DEFINE YEAR / GROUP; DEFINE quarter / GROUP FORMAT= 8. 'QUARTER(1)' CENTER;
DEFINE sales / COMPUTED FORMAT=DOLLAR8. SUM 'Total(2)'; DEFINE pct / ANALYSIS PCTSUM FORMAT=PERCENT8.0 'Percent(3)'; DEFINE diff / COMPUTED "TOTAL(4)" FORMAT= DOLLAR8.0 ; DEFINE diff_pct / COMPUTED 'PERCENT(5)' FORMAT=percent9.0; COMPUTE BEFORE year; r= 0 ; last= 0 ; total=sales.SUM; ENDCOMP; COMPUTE pct; pct=sales.SUM/total; ENDCOMP; COMPUTE diff; r+ 1 ; IF r= 1 THEN diff=. ; else DO; if BREAK EQ " " THEN diff=sales.SUM-last ; else diff =. ; end; last = sales.SUM; ENDCOMP; COMPUTE diff_pct; diff_pct= (diff/(last-diff) ); ENDCOMP; BREAK AFTER year / SUMMARIZE SKIP OL UL ; RUN;
Things To Notice in Table 6 Code The “COMPUTE BEFORE” block is used to created a record in the temporary data set needed to assign a value fo data variable “total”. The variable “total” is the denominator in calculating the variable “pct.” The “COMPUTE BEFORE” block also is used to initialize the line counter , “r”, to zero and the value of the last row, “last”, to zero.
The final table is given in Table 1. The code below produces that table and is just a combination of the code used to make Table 4 and Table 6. It uses all the techniques discussed earlier.
PROC REPORT DATA=QUARTER NOWD SPLIT="" headline; COLUMN ( year quarter ) ("SALES" sales pct ) ("QUARTERLY CHANGE" diff diff_pct ) ("YEAR-TO-DATE" ytd cumpct ); DEFINE year /GROUP 'YEAR(1)'; DEFINE quarter /GROUP FORMAT= 8. 'QUARTER(2)' CENTER; DEFINE sales /COMPUTED FORMAT=DOLLAR8. SUM 'Total(3)'; DEFINE PCT /COMPUTED FORMAT=PERCENT8. 'Percent(4)'; DEFINE diff /COMPUTED FORMAT= DOLLARA.0 "TOTAL(5)" center; DEFINE diff_pct/COMPUTED FORMAT=pctA. 'PERCENT(6)'; DEFINE ytd /COMPUTED FORMAT=DOLLAR8. 'YTD(7)' ; DEFINE cumpct /COMPUTED FORMAT=PERCENT8. 'PERCENT(8)'; COMPUTE BEFORE year; r= 0 ; last= 0 ; total=sales.SUM; cumtot= 0 ; ENDCOMP; COMPUTE diff; r+ 1 ; IF r= 1 THEN diff=.* ; IF r> 1 THEN DO; last=sales.SUM;