This document contains descriptions and context information for many of the sample data sets that are provided with
SAS Enterprise Guide.

----------------------------------------------------------------------- 
Name:  A

Analysis: Logistic ANOVA, Logistic Regression, Mixed Model ANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: A clinical trial was conducted comparing two treatments, an experimental drug versus a control. The study
was conducted at eight clinics. At each clinic, patients were assigned at random to either the experimental drug or
the control. The variables in the data set are the clinic number (CLINIC), the assignment to the experimental drug or
the control (TRT), the number of patients with favorable responses (FAV), the number of patients with unfavorable
responses (UNFAV), and the total number of patients at that clinic assigned to that treatment (NIJ). 

For educational purposes, the data can be analyzed with CLINIC as a random or a fixed effect.

Graphic Analysis: scatter plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 16 rows, 5 columns

----------------------------------------------------------------------- 
Name:  AML_Survival

Analysis: Survival Analysis

Reference:  Embury, S.H., Elias, L., Heller, P.H., Hood, C.E., Greenberg, P.L., 
and Schrier, S.L. 1977. Remission maintenance therapy in acute myelogenous 
leukemia. Western Journal of Medicine. 126: 267-272.

Description:  The dataset AML_Survival Data contains information on a trial 
conducted by Embury et al. (1977) at Stanford University.  The investigators 
were concerned with the efficacy of maintenance therapy for acute myelogenous 
leukemia (aml).  Initially, patients were treated by chemotherapy until 
remission.  Then, these patients were randomized into two groups-a treatment 
group that received maintenance therapy and a control group that did not.  
Individuals in both groups were followed until they suffered a relapse, the 
event of interest.  The event time variable is defined as the length of time in 
remission, i.e., the time from entry into the study until relapse.

Graphic Analysis:

SAS Product: Enterprise Guide; SAS/STAT

Size: 23 rows, 3 columns

----------------------------------------------------------------------- 
Name:  Arrestrates

Analysis: Descriptive statistics, time series analysis, ANOVA

Reference:  U.S. Department of Justice, Office of Justice Programs, Bureau of Justice Statistics
 http://www.ojp.usdoj.gov/bjs/dtdata.htm

Description: The data set is a record of the arrests per 100,000 people in each age group in the United States from 
1970 through 1999. The variables in the data set are: year (YEAR), arrests per 100 thousand in population (RATE), and 
the age group (AGEGROUP). The age groups are defined as (1) 14 and under, (2) 15-17, (3) 18-20, (4) 21-24, and
(5) 25 or over.

The data set could be used to generate descriptive statistics by age group or to do a time series analysis
to predict the arrest rates by age group. These predictions might be used in assessing the need for judicial 
system infrastructure changes. Finally, the data could be used to compare age groups with an ANOVA. 

This data set is a subset of the data in the data set totarrests.

Graphic Analysis: scatter plots

SAS Product: Enterprise Guide; SAS/STAT; SAS/ETS

Size: 150 rows, 3 columns

----------------------------------------------------------------------- 
Name:  auction

Analysis: multiple linear regression

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  This is data from 19 livestock auction markets. The columns include: the number of head of 
different livestock sold (in thousands) including CATTLE, CALVES, HOGS, and SHEEP, the cost of operation 
of the auction market (in thousands of dollars) (COST), and the market identifier (MARKETID). The object 
is to use multiple linear regression to describe the relationship between the cost of operations to the 
number of livestock sold in the various classes. COST will be the dependent variable and CATTLE, CALVES,
HOGS, and SHEEP the independent variables.

An additional variable, VOLUME, is the total of all major livestock sold in each market. It is the sum of
the variables CATTLE, CALVES, HOGS, and SHEEP, and can be used to demostrate an exact linear dependency between
independent variables. 

Graphic Analysis: scatter plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 19 rows, 7 columns

----------------------------------------------------------------------- 
Name:  Beer_Sales

Analysis: Spline Regression; Local Regression; Time Series

Reference:  Neter J., Wasserman, W., and Whitmore, G.A. 1988. Applied 
Statistics. 3rd edition. Allyn & Bacon, New York. 975-6.

Description:  Beer sales records monthly sales of beer in hectoliters, along 
with the average high and low temperatures in the region, over a period of five 
years.  The object is to see how beer sales change over time. You can also 
consider the relationships between beer sales and temperatures.

Graphic Analysis: Scatter Plots

SAS Product: SAS/STAT; SAS/ETS; Enterprise Guide (for time series analysis)

Size: 4 columns, 60 rows

----------------------------------------------------------------------- 
Name:  BloodPressure

Analysis: Paired-sample t-test; one-sample t-test; regression

Reference:  Generated for use in SAS Course Notes

Description:  Consider an experiment to examine the effectiveness of a 
medication in reducing blood pressure.  A random sample of individuals with high 
blood pressure is taken and their diastolic pressure is recorded.  The 
individuals are then placed on medication and one month later their diastolic 
blood pressure is once again recorded.  The dataset contains the following 
variables: subject, age, baseline blood pressure, and new blood pressure.

Graphic Analysis: histograms, box plots, scatter plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 60 rows, 4 columns


----------------------------------------------------------------------- 
Name:  Boston Housing Data

Analysis: two-sample t-test; simple and multiple linear regression with 
transformation of the response variable

Reference:  Blake, C., Keogh, E. and Merz, C.J. (1998), UCI Repository of 
machine learning databases (http://www.ics.uci.edu/~mlearn/MLRepository.html), 
Irvine, CA: University of California, Department of Information and Computer 
Science.

Description:  The data set contains census information for 506 housing tracts in 
the Boston area.  You can perform a two-sample t-test to examine the median 
values of owner-occupied homes in two groups of housing tracts, those near the 
Charles River and those farther away from it. The data can also be used to 
develop a regression model to predict median home values based on the other 
variables in the data set, such as crime rate, the percentage of industrial 
business acres, nitrogen oxide concentration, the average number of rooms in a 
home, the percentage of home built before 1940, the accessibility to radial 
highways, the property tax rate, the percentage of lower economic status 
families in the housing tract, and the pupil/teacher ratios in the local school.

Graphic Analysis: scatter plots, confidence ellipses

SAS Product: Enterprise Guide; SAS/STAT

Size: 506 rows, 13 columns

----------------------------------------------------------------------- 
Name:  bullets

Analysis: Descriptive statistics, confidence intervals, two-sample t-test

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  The data was collected to determine if there is a difference in the muzzle velocity (VELOCITY) of 
cartridges made from two types of gunpowder (POWDER). 

Graphic Analysis: box plot, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 18 rows, 2 columns

----------------------------------------------------------------------- 
Name:  calves

Analysis: ANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Two types fo feed rations (FEED) are given to calves from three different sires (SIRE). The dependent
variable is the coded amount of weight gain for each calf (WEIGHTGAIN).

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 18 rows, 3 columns

----------------------------------------------------------------------- 
Name:  calves2

Analysis: ANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Two types fo feed rations (FEED) are given to calves from three different sires (SIRE). The dependent
variable is the coded amount of weight gain for each calf (WEIGHTGAIN). (Note: this is the same data as the calves
data set but with an empty cell)

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 18 rows, 3 columns

----------------------------------------------------------------------- 
Name:  Candy

Analysis:  Descriptive Statistics; Confidence Intervals

Reference:  Using StatView, 2nd edition. SAS Institute Inc.

Description:  Since 1994, the United States Food and Drug Administration (FDA) 
has required uniform, easy-to-read nutrition labeling for nearly all foods.  The 
purpose of the new label is to reduce confusion and help consumers choose more 
healthful diets.

The United States Department of Agriculture (USDA) and the Department of Health 
and Human Service (HHS) have teamed up to produce the Food Guide Pyramid, which 
recommends eating a variety of foods, an appropriate number of calories, and a 
modest amount of fat-specifically, 30% or fewer of your total number of calories 
per day should be calories from fat, and only a third of those should be 
calories from saturated fat.  For adults consuming 2000 calories per day, which 
works out to no more than 65 grams of fat, no more than 20 grams of which are 
saturated fat.

We want to know how many candy bars can fit into this daily diet.  We found 
nutritional facts about every candy bar we could find.  We also included some 
non-bar candies like M&Ms, Reese's Pieces, Skittles, and Super hot Tamales.

Graphic Analysis: box plots; scatter plots; histograms

SAS Product: Enterprise Guide; Base SAS; SAS/GRAPH

Size: 75 rows, 17 columns

----------------------------------------------------------------------- 
Name:  Candy data sets- Candy_Customers, Candy_Products, Candy_Sales_History, 
       Candy_Sales_Summary, Candy_Time_Periods

Analysis: Descriptive statistics, time series analysis, ANOVA, correlations, 
  query, data management, data mining

Reference:  Stephen McDaniel, SAS, 2005

Description: This collection of data sets is for a fictional candy company- 
  Lots O' Calories.  

Graphic Analysis: scatter plots, bar, box, bubble, line, bar-line

SAS Product: Enterprise Guide; SAS/STAT; SAS/ETS; SAS GRAPH; 
  SAS Forecast Studio; SAS Enterprise Miner

----------------------------------------------------------------------- 
Name:  Cars

Analysis: Descriptive statistics; ANOVA

Reference:  StatView Reference, 2nd edition (1998). SAS Institute Inc.

Description:  The data set contains information on cars such as weight, gas tank 
size, turning radius, horsepower and engine displacement for 116 cars from 
different countries.

Graphic Analysis: scatter plots; box plots; histograms

SAS Product: Enterprise Guide; SAS/STAT; SAS/GRAPH

Size: 116 rows, 8 columns


----------------------------------------------------------------------- 
Name:  cars_1993

Analysis: descriptive statistics, t-tests, ANOVA, Regression, ANCOVA, data 
transformation

Reference: This represents a subset of the information reported in the 1993
Cars Annual Auto Issue published by Consumer Reports and from Pace New Car
and Truck 1993 Buying Guide 

Description:  A random sample of 92 1993 model cars is contained in this data 
set. The information for each car includes: manufacturer, model, type (sporty, 
van, small, midsize, large, or compact), price (in thousands of dollars), city 
mpg, highway mpg, engine size, horsepower, fuel tank size, weight, and origin 
(US or non-US). The data is excellent for doing descriptive statistics by groups 
or an ANOVA or regression with price as the response variable. Note that 
violations of the assumptions are probably present and transformation of the 
response variable is most likely necessary.

Graphic Analysis: scatter plot; histogram; box plot; bar chart

SAS Product: Enterprise Guide; base SAS; SAS/STAT

Size: 92 rows, 12 columns

----------------------------------------------------------------------- 
Name:  challenger

Analysis: Logistic Regression, Probit regression

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Data documenting the presence or absence of primary O-ring thermal distress in the 23 shuttle
launches preceding the Challenger mission were collected. The focus of this data is to determine if there is a
relationship between the temperature at launch time and o-ring thermal distress. The variables in the data set are
temperature at launch (TEMP), the number of launches in which a thermal distress occured for that temperature (TD),
the total number of launches at that temperature (TOTAL), and the number of launches in which thermal distress did not
occur at that temperature (NO_TD).

This data exists in an alternative form in the data set O-ring. In that case, the variables in the data set are 
the flight number (FLT), the temperature at launch (TEMP), and an indicator variable for whether or not there was
thermal distress during the launch (TD) (0=no distress, 1=distress).

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 16 rows, 4 columns

----------------------------------------------------------------------- 
Name:  chips

Analysis: Crossed-Nested ANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  An engineer in a semiconductor plant investigated the effect of several models of a process
condition (ET) on the resistance in computer chips. Twele silicon wafers (WAFER) were drawn from a lot, and
three wafers were randomly assigned to each of four modes of ET. Resistance (RESISTANCE) in the chips was 
measured in four positions (POSITION) on each wafer after processing. 

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 48 rows, 4 columns

----------------------------------------------------------------------- 
Name:  Cholesterol

Analysis: paired sample t-test

Reference:  Generated for SAS training course

Description:  Suppose that cholesterol measurements are taken on a group of 
subjects with high cholesterol levels.  After these measurements are collected, 
the subjects attend a training session that discusses methods to control 
cholesterol levels including such things as diet and exercise.  After a 
specified period of time, cholesterol measurements are collected on each of the 
subjects again.  You want to determine whether there is a difference between the 
cholesterol measurements before and after the training.

Graphic Analysis: scatter plot, box plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 95 rows, 3 columns

----------------------------------------------------------------------- 
Name:  Coffee

Analysis: contingency table; simple logistic regression

Reference:  MacMahon, B., S. Yen, D. Trichopoulos, K. Warren, and G. Nardi. 
1981. Coffee and cancer of the pancreas. New England Journal of Medicine. 
304(11). 630-33.

Description:  This example is based on a dataset relating coffee consumption to 
incidence of pancreatic cancer.  These data arose from a case-control study, and 
for this illustration we will use the data for male subjects.  Case Outcome is a 
binary category variable recording whether each individual represents a case 
(pancreatic cancer) or a control (no cancer).  Daily Coffee is a continuous 
variable recording how much coffee each individual drinks: 0 for none, 1.5 for 
1-2 cups per day, 3.5 for 3-4 cups per day, or 5.5 for 5 or more cups per day.  

Graphic Analysis:

SAS Product: Enterprise Guide; SAS/STAT

Size: 523 rows, 3 columns

----------------------------------------------------------------------- 
Name:  College

Analysis: One-Way ANOVA, Regression, ANCOVA

Reference: Money magazine, 1991.

Description: The data is a collection of information on colleges and 
universities collected in the early 1990's. The primary interest is in 
predicting graduation rates, the percent of students who graduate from the 
institution in four years. Potential predictor variables are tuition, type of 
college (public or private), and region of the country.

Graphic Analysis: Box Plots, Scatter Plots

Size: 200 rows, 6 columns

----------------------------------------------------------------------- 
Name:  Colonoscopy

Analysis: Contingency Table Analysis, Ordinal Logistic Regression

Reference:  Grossman, S., M. Milos, I. S. Tekawa, and N. P. Jewell. 1989. 
Colonoscopic screening of persons with suspected risk factors for colon cancer: 
II. Past history of colorectal neoplasms. Gastroenterology. 96. 299-306.

Description:  The data are from a prospective study of the findings of a 
colonoscopy screening study on individuals considered to be at high risk of 
colon cancer. The purpose of the study was to determine the role of past history 
in predicting the findings of a current colonoscopy.  The cases considered here 
correspond to 406 individuals who had adenoma findings in previous colon 
examinations and who are therefore considered to be at high risk of a subsequent 
significant finding. The two variables in the data set are Finding (coded 0 for 
negative examination, 1 for small adenoma, and 2 for large adenoma) and Age. All 
ages have been rounded, ages 30-39 years coded as 35, ages 40-49 years coded as 
45, etc.

Graphic Analysis: bar charts

SAS Product: Enterprise Guide; SAS/STAT

Size: 406 rows, 2 columns

----------------------------------------------------------------------- 
Name:  corn

Analysis: correlation and regression

Reference: Draper, N.R. and Smith, H. (1981) Applied Regression Analysis, Second
Edition, New York: John Wiley & Sons, Inc. 

Description:  The data was collected to examine the effect of weather related 
phenomena on corn yield. The data set includes information on the total 
precipitation (in inches) for the year prior to the start of the growing season, 
the average daily temperature (in degrees Fahrenheit) for each of the months of 
May through August, the total rain (in inches) during each of the months June 
through August, and the corn yield (in bushels per acre). This information was 
collected for each of the years 1930 through 1962. The year is also included in 
the data set. You are interested in determining the relationship between the 
corn yield and the other variables.

Graphic Analysis: scatter plots; box plots; histograms

SAS Product: Enterprise Guide; SAS/STAT; SAS/GRAPH

Size: 33 rows, 10 columns

----------------------------------------------------------------------- 
Name:  cotton

Analysis: Mixed Model

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  This data is from a two factor factorial with two stages of subsampling. The object of the study
is to estimate the weight of usable lint (LINT) from the total weight of cotton bolls (BOLLWT). In addition,
the researcher wants to see if lint estimation is affected by varieties of cotton (VARIETY) and the distance
between planting rows (SPACING). The study is a factorial experiment with two levels of VARIETY and two levels
of SPACING. There are two plants for each VARIETY x SPACING treatment combination, and there are from five
to nine bolls per plant (PLANT).

Graphic Analysis: box plots, histograms, means plots, scatter plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 49 rows, 5 columns

----------------------------------------------------------------------- 
Name:  cotton1

Analysis: Multivariate ANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  The total weight of a mature cotton boll can be divided into three parts: the weight of the seeds, 
the weight of the lint, and the weight of the bract. Lint and seed constitute the economic yield of cotton.
In this data, the differences in thre three compontnts of the cotton bolls due to two varieties (VARIETY) and two
plant spacings (SPACING) are studied. Five plants are chosen at random from each of the four treatment combinations.
Two bolls are picked from each plant, and the weights of the seeds, linc, and bract are recorded. 

Graphic Analysis: box plots, histograms, means plots, scatter plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 40 rows, 6 columns

----------------------------------------------------------------------- 
Name:  counts

Analysis: Poisson Regression

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: This data is from an insect control experiment. The treatment design consisted of an untreated control
group (TRT=0) and a 3x3 factorial for a total of 10 treatments. The experiment was conducted as a randomized complete
block design with four blocks (BLOCK). The response variable was the insect count (COUNT). The variable CTL_TRT is
coded 0 for control and 1 otherwise. The two treatment factors, A and B, have 3 levels each (1, 2, and 3) and both
are coded 0 for the control.

Graphic Analysis: scatter plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 40 rows, 6 columns

----------------------------------------------------------------------- 
Name:  cult_inoc

Analysis: Split-Plot

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  This data was collected to analyze the effect of three bacterial inoculation treatments (INOCULATION)
apllied to two cultivars of grasses (CULTIVAR) on dry weight yields (DRYWT). The experiment is a split-plot 
design with CULTIVAR (levels a and b) as the main plot factor and INOCULATION as the subplot factor. INOCULATION
has the values control, live and dead.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 24 rows, 4 columns

----------------------------------------------------------------------- 
Name:  defecttypes

Analysis: quality control: pareto charts

Reference:  Generated for StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  You are in charge of the quality control effort at a bicycle 
manufacturer that specializes in limited production frames.  The most popular 
model your company produces is a day touring model called the "Arribe!", which 
is a racing-style frame for weekend warriors.

The seat tube has been a source of quality problems in the manufacturing plant 
in the past. This data set has information on reasons for rejection of the seat 
tubes during weeks 7 and 8 of a recent manufacturing cycle. You will use this 
data to determine if the pattern of defects is the same for both weeks.

Graphic Analysis: pareto charts

SAS Product: Enterprise Guide; SAS/QC

Size: 1000 rows, 2 columns

----------------------------------------------------------------------- 
Name:  drug

Analysis: ANOVA

Reference:  Generated for SAS training course

Description:  This is data from an experiment to evaluate the effect of four 
different drugs on blood pressure for individuals with one of three possible 
diseases. Each individual is administered one of the four drugs over a period of 
time and the increase in systolic blood pressure is recorded. You want to 
compare the average increase in blood pressure for the different drugs and 
diseases.

Graphic Analysis: box plot; histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 72 rows, 3 columns

----------------------------------------------------------------------- 
Name:  drugs

Analysis: Unbalanced Anova, Mixed Model

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  A pharmaceutical company compared effects of two drugs, A and B, on a clinical measurement called
FLUSH. The studey untilized patients in 10 clinics in order to obtain respresentation from diverse patient populations.
The variables in the data set are STUDY (which is the clinic identifier), TREATMENT, PATIENT, FLUSH0, FLUSH. 
The values of FLUSH0 were obtained prior to administration of the drugs.

If you assume the clinics are ranodmly selected from a population of clinics, then clinics becomes a random effect
and the model is a mixed model.  

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 135 rows, 5 columns

----------------------------------------------------------------------- 
Name:  drugs1

Analysis: Unbalanced Anova with empty cells

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  A pharmaceutical company compared effects of two drugs, A and B, on a clinical measurement called
FLUSH. The studey untilized patients in 10 clinics in order to obtain respresentation from diverse patient populations.
The variables in the data set are STUDY (which is the clinic identifier), TREATMENT, PATIENT, FLUSH0, FLUSH. 
The values of FLUSH0 were obtained prior to administration of the drugs. (Note: this is the same data as the DRUGS
data set with the addition of STUDY 41 which only had observations for drug B. Therefore, there is an empty cell
associated with the STUDY 41, DRUG A combination.)

If you assume that the clinics were randomly selected from a larger population of clinics, then CLINIC is a random
effect and the model becomes a mixed model.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 151 rows, 5 columns

----------------------------------------------------------------------- 
Name:  Exercise

Analysis: ANOVA, MANOVA

Reference:  Generated for StatView Reference, 2nd edition. 1998. SAS Institute Inc. 108.

Description:  You are an exercise physiologist who wants to determine whether 
stretching and wearing ankle weights has any effect on the value of treadmill 
exercise.  You could test this hypothesis by measuring calories burned, average 
speed in meters per minute, and oxygen consumed in liters for a number of 
subjects who you have previously determined have roughly the same level of 
physical fitness, divided randomly into four groups: with or without ankle 
weights, and with or without a period of stretching before the exercise.  

Graphic Analysis: box plots, histograms

SAS Product: Enterprise Guide (ANOVA); SAS/STAT

Size: 20 rows, 5 columns

----------------------------------------------------------------------- 
Name:  FEV1MULT

Analysis: Repeated Measures

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: A pharmaceutical compnay examined effects of three drugs on respiratory ability of asthma patients.
The drugs were randomly assigned to 24 patients each. The assigned drug was administered to each patient. Then a
standard measure of respiratory ability called FEV1 was measured hourly for eight hours following treatment. FEv1
was also measured immediately prior to administering the drug (BASEFEV1). 

This data set is organized to perform a mulitvariate analysis of repeated measures data. That is, every
patient appears in only one row of the data set with a separate column for each of the eight FEV1 measurements.

The FEV1UNI data set is organized to perform a univariate ANOVA of the repeated measures data. That is, every FEV1
measurement is a different row in the data set. Therefore, each patient has 8 rows in the data set.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 72 rows, 11 columns

----------------------------------------------------------------------- 
Name:  FEV1UNI

Analysis: Repeated Measures

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: A pharmaceutical compnay examined effects of three drugs on respiratory ability of asthma patients.
The drugs were randomly assigned to 24 patients each. The assigned drug was administered to each patient. Then a
standard measure of respiratory ability called FEV1 was measured hourly for eight hours following treatment. FEv1
was also measured immediately prior to administering the drug (BASEFEV1). 

This data set is organized to perform a univariate ANOVA of the repeated measures data. That is, every FEV1
measurement is a different row in the data set. Therefore, each patient has 8 rows in the data set.

The data set FEV1MULT is organized to perform a mulitvariate analysis of repeated measures data. That is, every
patient appears in only one row of the data set with a separate column for each of the eight FEV1 measurements.


Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 576 rows, 5 columns

----------------------------------------------------------------------- 
Name:  filemarks

Analysis: quality control: c/u charts

Reference:  Generated for StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  You are in charge of the quality control effort at a bicycle 
manufacturer that specializes in limited production frames.  The most popular 
model your company produces is a day touring model called the "Arribe!", which 
is a racing-style frame for weekend warriors.

The seat tubes of the bicycle frames are inspected. One of the most common 
problems with the tubes is stray file marks. Although this does not affect the 
functionality of the seat tube, it does affect the looks. The filemarks data set 
contains 10 weeks of inspection data and has the total number of files marks for 
all bicycle tubes inspected during each of the ten weeks. 

Graphic Analysis: quality control charts

SAS Product: Enterprise Guide; SAS/QC

Size: 10 rows, 3 columns

----------------------------------------------------------------------- 
Name:  Flaxoil

Analysis: Randomized Complete Block ANOVA, Mixed Model ANOVA

Reference:  Steel, R. G. D., and Torrie, J.H. 1980. Principles and Procedures of 
Statistics: a Biometrical Approach. McGraw-Hill, New York.

Description:  The Flax Oil data set includes percentage measurements of oil 
content in flaxseed grown in each of four different locations for six different 
treatments.  At each location one plant was inoculated with bacteria as a 
seedling, one plant in early bloom, one in full bloom, one at a lower dose in 
full bloom, and one when the plant was ripening.  A sixth plant in each location 
was a control case, not inoculated at all.  There was no replication of 
treatment by location combinations. The purpose of the experiment was to 
determine whether the treatments had any effect on the oil content of the 
flaxseed. The four locations represent the blocks in this experiment. One could 
consider that the blocking factor should be treated as a random effect, which 
would result in a mixed model.

Graphic Analysis: histogram; box plot

SAS Product: Enterprise Guide (ANOVA); SAS/STAT (Mixed Model)

Size: 24 rows, 3 columns

----------------------------------------------------------------------- 
Name:  fr_t7_3

Analysis: Logistic ANCOVA, Logistic Regression

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: This data set has data from a bioassay involving two drugs, standard (STD) and treated (TRT) injected
in varying dosages to 20 mice per treatment-dose combination. The response variable of interest is the number of
mice (out of the 20) that are ALIVE versus the number DEAD. An alternative analysis approach is to use the 
base 2 log of dosage rather than the actual dosages. This is included as the variable X.

Graphic Analysis: scatter plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 9 rows, 6 columns

----------------------------------------------------------------------- 
Name:  garments

Analysis: Latin Square Design

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Four materials (MAterial) used in permanent press garments are subjected to a test for 
weight loss (WTLOSS) and shrinkage (SHRINK). The materials are placed in a heart chamber that has four 
control settings or positions (POSITION). The test is conducted in four runs (RUN), with each material
assigned to each of the four positions in one run fo the experiment. The weight loss and shrinkage are
measured on each sample after each test.

Graphic Analysis: box plots, histograms

SAS Product: Enterprise Guide; SAS/STAT

Size: 16 rows, 5 columns

----------------------------------------------------------------------- 
Name:  grasses

Analysis: Two-way ANOVA, Mixed Model

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Three methods of promoting seed growth (METHOD) are applied to seed from each of five varieties (VARIETY).
Six plots are planted with seed from each METHODxVARIETY combination. The resulting 90 pots were randomly placed in a 
growth chamber and the dry matter yields were measured after clipping at the end of four weeks.

The data are recorded in the data set with each of the six replicate measurements on the same row in the data set.
Therefore, this data will have to be reorganized in order to analyze it as a two-way ANOVA. This can be done using
the stack columns function in Enterprise Guide or with a data step program. (Note: the data set grasses1 contains the 
same data reorganized for analysis.)

In the case where you are interested in a whole population of varieties and these five are a random sample from that
population, VARIETY would be treated as a random effect, resulting in a mixed model.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 15 rows, 9 columns

----------------------------------------------------------------------- 
Name:  grasses1

Analysis: Two-way ANOVA, Mixed Model

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Three methods of promoting seed growth (METHOD) are applied to seed from each of five varieties (VARIETY).
Six plots are planted with seed from each METHODxVARIETY combination. The resulting 90 pots were randomly placed in a 
growth chamber and the dry matter yields were measured after clipping at the end of four weeks (YIELD).

Note: this data set contains the same data as the data set grasses except the data has been reorganized for analysis.

In the case where you are interested in a whole population of varieties and these five are a random sample from that
population, VARIETY would be treated as a random effect, resulting in a mixed model.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 90 rows, 4 columns

----------------------------------------------------------------------- 
Name:  Hospice

Analysis: Nonparametric ANOVA

Reference:  Kathryn Skarzynski, Wright State University, Dayton, OH

Description:  Consider a study done to determine whether there was a change in 
the number of referrals received from physicians after a visit by a hospice 
marketing nurse.  A portion of Ms. Skarzynski's data about these hospice 
marketing visits includes physician ID, type of visit, type of practice, date, 
and change in referrals after one month (change1) and after three months 
(change3).

Graphic Analysis: histogram; box plot

SAS Product: Enterprise Guide, SAS/STAT

Size: 54 rows, 6 columns

----------------------------------------------------------------------- 
Name:  Leppik

Analysis: Poisson Regression with Repeated Measures

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: This data is from a study evaluating a new treatment for epilepsy. The variable ID identifies each
patient in the study. The treatements are TRT=0, a placebo, and TRT=1, an anti-epileptic drug. The response
variable is the number of seizures over a two-week interval. For the eight weeks prior to placing the
participants on treatment, the number of seizures was counted for each patient in order to form a baseline 
measurement (BASE). The patients' ages (AGE) in years are also included in the data set. The number of  seizures
was recorded for each of four two-week time intervals after being placed on the treatment and appears in the 
data set as Y1 through Y4.

Two additional variables appear in the data set that might be used during the analysis. One is the log of age (LOG_AGE)
and the other is the log of (BASE/4).

This data set will need to be reorganized for analysis so that one post treatment observation appears on each row
of the data set. This can be done with Enterprise Guide using the stack function, or with a data set. Also, the 
appropriately organized data is included in the data set SEIZURE.

Graphic Analysis: scatter plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 59 rows, 10 columns

----------------------------------------------------------------------- 
Name:  Lipid Data

Analysis: Descriptive Statistics, one-sample t-tests, paired t-tests, 
correlation analysis, regression, ANOVA

Reference:  Dr. Terence T. Kuske, Professor of Medicine, Medical College 
of Georgia, Augusta, GA.

Description:  Data has been collected from blood lipid screenings as well as 
patient history. Information such as gender, age, weight, total cholesterol 
level, blood pressure, coffee consumption, and history of heart disease was 
collected. The blood lipid screenings were conducted three months after the 
initial screenings. This data is rich for various analyses. 

Graphic Analysis: histograms, box plots, bar charts, scatter plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 95 rows, 25 columns

----------------------------------------------------------------------- 
Name:  Marathons

Analysis: Two-sample t-test

Reference:  

Description:  You are interested in comparing the time it takes to run the 
marathon in New York City and Boston.  A random sample of 50 observations from 
the Boston marathon and 100 observations from the New York marathon have been 
recorded and saved.  The variables in the dataset include city and time (in 
hours).

Graphic Analysis: histogram; box plot

SAS Product: Enterprise Guide

Size: 150 rows, 2 columns

----------------------------------------------------------------------- 
Name:  market

Analysis: simple linear regression

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  This is data from 19 livestock auction markets, including the numbers of head of cattle sold (in
thousands) (CATTLE), the cost of operations of the auction market (in thousands of dollars) (COST), 
and the market identifier (MARKETID). The object is to use simple linear regression to describes the relationship
between the cost of operations to the number of cattle sold. COST will be the dependent variable and CATTLE the
independent variable.

Graphic Analysis: scatter plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 19 rows, 3 columns

----------------------------------------------------------------------- 
Name:  methods

Analysis: One-way ANOVA, ANOVA with blocks, Mixed Model with random block

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Five methods of providing irrigation (IRRIG) are used on an orange grove. At harvest, the fruit is
weighed to determine if the method or irrigation affects fruit weight (FRUITWT). In this case, the grove was divided
into eight blocks (BLOCK) to account for local variation in the grove. The assignment of the irrigation method to
the trees within the block was done randomly and each of the irrigation methods appears in every block. Therefore,
this is a randomized complete block design.

The blocking factor can also be considered a random effect in which case the analysis would be done with a mixed 
model.

Graphic Analysis: box plots, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 40 rows, 3 columns

----------------------------------------------------------------------- 
Name:  Microbgs

Analysis: Nested Mixed Models

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Microbial counts are made on samples of ground beef in a study to assess sources of variation 
in numbers of microbes. Tweny packages of ground beef (PACKAGE) are purchased. Three samples are drawn from
each package and two replicate counts are made on each sample. In the data set CT11 refers to the first sample,
first replicate count for the package; CT12 refers to the first sample, second replicate count; CT21 refers
to the second sample, first replicate count; and so on. Again this data set will have to be reorganized before 
analysis. (Note: the data set microbgs1 contains the reorganized data).

Also, because of the skewed nature of the data, it is common to take the logarithm of the cournts for analysis
purposes. 

In this case, the sample is nested within the package and package is a random effect.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 20 rows, 7 columns

----------------------------------------------------------------------- 
Name:  Microbgs1

Analysis: Nested Mixed Models

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Microbial counts are made on samples of ground beef in a study to assess sources of variation 
in numbers of microbes. Tweny packages of ground beef (PACKAGE) are purchased. Three samples are drawn from
each package (SAMPLE) and two replicate counts are made on each sample (REPLICATE).  (Note: the data set 
here is the same as the data set microbgs except that the data has been reorganized for analysis).

Also, because of the skewed nature of the data, it is common to take the logarithm of the cournts for analysis
purposes. 

In this case, the sample is nested within the package and package is a random effect.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 20 rows, 7 columns

----------------------------------------------------------------------- 
Name:  Nosocomial

Analysis: ANOVA, ANCOVA, Regression

Reference:  

Description:  The data is a study conducted to determine whether the risk of 
nosocomial (hospital-acquired) infection is affected by other hospital 
characteristics such as: type of hospital (public or private), average number of 
patients at the hospital, average age of the patients, average number of beds, 
and average number of nurses on staff.

Graphic Analysis: scatter plots; histograms; box plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 113 rows, 7 columns


----------------------------------------------------------------------- 
Name:  oranges

Analysis: ANCOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  The data are from a study of the relationship between the price of oranges and sales per customer.
The hypothesis is that sales vary as a function of price differences for different stores (STORE) and days of 
the week (DAY). The price is varied daily for two varieties of oranges. The variables P1 and P2 denote the prices
for the two varieties, respectively. The variables Q1 and Q2 are the sales per customer of the corresponding 
varieties. Q1 and Q2 are used as the dependent variables, with STORE, DAY, P1, and P2 as the independent variables.

Graphic Analysis: box plots, histograms, means plots, scatter plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 36 rows, 6 columns

----------------------------------------------------------------------- 
Name:  oysters

Analysis: ANCOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Four bags with 10 oysters in each bag are randomly placed at each of five stations in the cooling water
canal of a power-generating plant. Each location, or station, is considered a treatment and is represented by the
varialbe TRT in the data set. Each bag is considered to be one experimental unit. Two stations are located in the 
intake canal, and two stations are located in the discharge canal, one at the top and the other at the bottom of
each location. A single mid-depth station is located in a shallow portion of the bay near the power plant. The 
treatments are coded 1 through 5 in the data set as follows: (1) intake-bottom, (2) intake-surface, 
(3) discharge-bottom, (4) discharge-surface, and (5) bay.

The purpose of the experiment is to determine if exposure to water heated artificially affects growth and if the
position in the water column (surface or bottom) affects growth. Stations in the intake canal act as controls for
those in the discharge canal, which has a higher temperature. The station in the bay is an overall control in case
some factor other than the heat difference dure to water depth or location is responsible for an observed change
in growth rate.

The oysters are cleaned and measured at the beginning of the experiment (INITIAL) and again about one month
later (FINAL). These two weights are recorded for each bag.

Graphic Analysis: box plots, histograms, means plots, scatter plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 20 rows, 4 columns

----------------------------------------------------------------------- 
Name:  o_ring

Analysis: Logistic Regression, Probit regression

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Data documenting the presence or absence of primary O-ring thermal distress in the 23 shuttle
launches preceding the Challenger mission were collected. The focus of this data is to determine if there is a
relationship between the temperature at launch time and o-ring thermal distress. The variables in the data set are 
the flight number (FLT), the temperature at launch (TEMP), and an indicator variable for whether or not there was
thermal distress during the launch (TD) (0=no distress, 1=distress).

This data exists in an alternative form in the data set Challenger. In that case, the variables in the data set are
temperature at launch (TEMP), the number of launches in which a thermal distress occured for that temperature (TD),
the total number of launches at that temperature (TOTAL), and the number of launches in which thermal distress did not
occur at that temperature (NO_TD).

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 23 rows, 3 columns

----------------------------------------------------------------------- 
Name:  peppers

Analysis: Descriptive statistics, confidence interval, one-sample t-test

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  An engineer wants to design a mechanical harvester for bell peppers. He measured and recorded the
angle at which peppers hang on the plant. The purpose of the analysis to determine construct a 95% confidence
interval for the mean and to determine if the average angle is equal to zero.

Graphic Analysis: box plot, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 28 rows, 1 columns

----------------------------------------------------------------------- 
Name:  pressure

Analysis: Paired-sample t-test; one-sample t-test; regression; ANCOVA

Reference:  Generated for SAS Course Notes

Description:  Consider an experiment to examine the effectiveness of a 
medication in reducing blood pressure.  A random sample of individuals with high 
blood pressure is taken and their diastolic pressure is recorded.  The 
individuals are then placed on medication and one month later their diastolic 
blood pressure is once again recorded.  The dataset contains the following 
variables: subject, age, baseline blood pressure, and new blood pressure.

In addition there is a column in the data set that includes the type of drug 
taken (new, approved, or placebo). If this information is included, an 
analysis of covariance can be done.

Graphic Analysis: histograms, box plots, scatter plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 93 rows, 4 columns

----------------------------------------------------------------------- 
Name:  pulse

Analysis: Descriptive statistics, confidence interval, paired-sample t-test

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  A drug is administered to animals and pulse rates before (PRE) and after (POST) the drug are recorded.
An additional column, D, is included in the data set, which is the difference between PRE and POST. The purpose of
the experiment is to determine if the drug changes the pulse rates of the animals.

Graphic Analysis: box plot, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 15 rows, 3 columns

----------------------------------------------------------------------- 
Name:  rats

Analysis: ANOVA, MANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Weight gains in rats given a special diet were measured at one (GAIN1), two (GAIN2),
three (GAIN3), and four (GAIN4) weeks after beginning the administration of the diet. The question of
interest is whether the rats' weight gains stayed constant over the course of the experiment. In other 
words, were the average weight gains the same at each of the four weeks? An intercept only MANOVA model 
can be used to answer this question, or the data can be stacked and an ANOVA used.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 10 rows, 4 columns
See Index_Sorted for SAS products and analysis type per sample dataset.

----------------------------------------------------------------------- 
Name:  Sales

Analysis: Logistic Regression

Reference:  Generated for use in SAS course notes

Description:  A mail-order company wants to identify those customers to whom 
their advertising efforts should be directed.  They have decided that customers 
who spend 100 dollars or more are their target group.  They have collected data 
on their customers such as purchase level (1 = at least $100; 0 = less than 
$100), gender, income level (Low, Medium, or High), and age.

Graphic Analysis: bar chart, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 431 rows, 4 columns


----------------------------------------------------------------------- 
Name:  seizure

Analysis: Poisson Regression with Repeated Measures

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: This data is from a study evaluating a new treatment for epilepsy. The variable ID identifies each
patient in the study. The treatements are TRT=0, a placebo, and TRT=1, an anti-epileptic drug. The response
variable is the number of seizures over a two-week interval. For the eight weeks prior to placing the
participants on treatment, the number of seizures was counted for each patient in order to form a baseline 
measurement (BASE). The patients' ages (AGE) in years are also included in the data set. The number of  seizures (Y)
was recorded for each of four two-week time intervals after being placed on the treatment. The variable TIME
indicates which two week time period is recorded on that row (1=first two weeks, 2=second two weeks, etc.)

Two additional variables appear in the data set that might be used during the analysis. One is the log of age (LOG_AGE)
and the other is the log of (BASE/4).

This data set is the same data that appears in the data set LEPPIK, but has been reorganized for analysis.

Graphic Analysis: scatter plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 236 rows, 8 columns

----------------------------------------------------------------------- 
Name:  teachers

Analysis: ANOVA, MANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description:  Student exam scores are collected, where each student is taught by one of three teachers.
The purpose of the analysis is to compare the average scores for each of the three teachers. The teachers can
be compared using either the score on the first exam (SCORE1) or the score on the second exam (SCORE2).

A multivariate ANOVA (MANOVA) can be used to determine if there is a difference between teachers when considering
both SCORE1 and SCORE2 simultaneously.

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 30 rows, 3 columns

----------------------------------------------------------------------- 
Name:  Teaching

Analysis: Repeated Measures ANOVA

Reference:  Generated for StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  A study was conducted in an industrial setting to test the 
effectiveness of several techniques for teaching the use of a respirator mask.  
Subjects were divided randomly into three groups: a control group that received 
no training in the use of the mask, a group that received a detailed instruction 
sheet, and a third group that attended a thirty minute class.  The effectiveness 
of the mask was measured for each of the subjects before training and also one 
and two weeks after training.  The purpose of the study is to determine whether, 
on average, there is any difference in effectiveness among the three teaching 
techniques. 

Graphic Analysis: box plot, histogram, scatter plot

SAS Product: SAS/STAT

Size: 28 rows, 4 columns

----------------------------------------------------------------------- 
Name:  Totarrests

Analysis: Descriptive statistics, time series analysis, paired sample t-test, ANOVA

Reference:  U.S. Department of Justice, Office of Justice Programs, Bureau of Justice Statistics
 http://www.ojp.usdoj.gov/bjs/dtdata.htm

Description: The data set is a record of the total number of arrests in the United States from 
1970 through 1999. The variables in the data set are: year (YEAR), total number of arrests (TOTALARRESTS),
total number of arrests by age group (AGE1 through AGE5), arrests per 100 thousand in population,
total (ARRESTRATE) and by age group (AGE1RATE through AGE5RATE), total population of the U.S. on July
1 of the given year (POPULATION), and total population for each age group (AGE1POP through AGE5pop).

The data set could be used to generate descriptive statistics for the total population and by age group, or 
to do a time series analysis and predict total number of arrests for the population as a whole or by
age group. These predictions might be used in assessing the need for judicial system infrastructure
changes. Finally, the data could be used to compare age groups with an ANOVA, but the data would have to 
be reorganized to conduct such an analysis using the stack function in Enterprise Guide or a data step. 

The data set arrestrates has the arrest rates for the 5 age groups organized to do an analysis of variance.

Graphic Analysis: scatter plots

SAS Product: Enterprise Guide; SAS/STAT; SAS/ETS

Size: 30 rows, 19 columns

----------------------------------------------------------------------- 
Name:  Tree

Analysis: Regression, ANCOVA, Polynomial Regression, Nonlinear Regression

Reference:  StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  In the 1930's, the weights and trunk girths were measured for 
eight specimens from each of thirteen rootstocks, for a total of 104 tree 
specimens.  The purpose is to determine if the girth and/or rootstock of the 
trees are useful in predicting the weight of trees.  This would make it possible 
to get accurate estimates of weight without having to cut trees down and weigh 
them, a destructive and difficult process.

Graphic Analysis: box plot, scatter plot, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 104 rows, 3 columns

----------------------------------------------------------------------- 
Name:  tubeangle

Analysis: quality control: xbar, r, and s charts, capability analysis

Reference:  Generated for StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  You are in charge of the quality control effort at a bicycle 
manufacturer that specializes in limited production frames.  The most popular 
model your company produces is a day touring model called the "Arribe!", which 
is a racing-style frame for weekend warriors.

The seat tube angle of a bicycle frame can dramatically affect the finished 
bicycle's handling characteristics.  This is the angle formed by the 
intersection of the tube that holds the seat post with the top horizontal frame 
tube.  A small seat tube angle endows the frame with forgiving handling 
characteristics.  Weekend warriors want frames that are responsive and quick; 
they prefer frames with steep seat tube angles.  The "Arribe!" is manufactured 
with these specifications in mind. The purpose of this analysis is to determine 
if the manufacturing process is in control. The target angle is 74 degrees, with 
specification limits of 73.7 and 74.3 degrees.

Graphic Analysis: quality control charts

SAS Product: Enterprise Guide; SAS/QC

Size: 100 rows, 2 columns

----------------------------------------------------------------------- 
Name:  TubeDefects

Analysis: quality control, p and np charts

Reference: Generated for StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  Often times, it is more cost-effective to simply evaluate whether 
an item is defective or not.  This data is recorded from frame tubes prior to 
assembly.  Frame tubes need to be meticulously filed, mitered and sanded before 
they are joined into a complete frame.  The tube ends are then inspected to 
assure that they fit together properly.  Rather than base your analyses on each 
of the measures that affect whether tubes fit together, you will analyze a 
single characteristic, specifically whether each individual tube is defective or 
not.

Graphic Analysis: histogram, box plot, scatter plot

SAS Product: Enterprise Guide; SAS/QC

Size: 960 rows, 2 columns

----------------------------------------------------------------------- 
Name:  Turnips 

Analysis: ANOVA with two blocks, a latin square design

Reference:  Steel, R. G. D., and Torrie, J.H. 1980. Principles and Procedures of 
Statistics: a Biometrical Approach. McGraw-Hill, New York.

Description:  To determine whether the moisture content of turnip green leaves 
is affected by time in storage, researchers classified the leaves of five turnip 
plants into five size groups, subjected these leaves to one of five lengths of 
storage time according to a specific pattern, and finally measured the moisture 
content of each leaf.  Plant and leaf size are the blocking factors for this 
experiment.

Graphic Analysis: box plot, histogram

SAS Product: Enterprise Guide; SAS/STAT
 
Size: 25 rows, 4 columns

----------------------------------------------------------------------- 
Name:  type_dose

Analysis: ANOVA

Reference:  Littell, Ramon C., Stroup, Walter W., and Freund, Rudolf J. 2002. SAS for Linear Models, 4th Edition.
Cary, N.C.: SAS Institute Inc.

Description: This data set contains data from an experiment designed to compare the response to increasing dosage
for two types of drugs. There were three levels of the actual dosage (DOSE). The data were analyzed using the base
10 log of dose (LOGDOSE). The experiemnt was conducted as a randomized complete block design, where BLOCK is the 
blocking factor. Y is the response variable. 

If the blocking factor is treated as a random effect this becomes a mixed model. 

Graphic Analysis: box plots, histograms, means plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 24 rows, 6 columns

----------------------------------------------------------------------- 
Name:  Ulcers

Analysis: nonparametric ANOVA, transformation of response for ANOVA

Reference:  

Description:  Consider an experiment to investigate the content of gastric 
juices of patients.  The goal of the experiment is to determine the average 
lysozyme. (Lysozyme is an enzyme that can destroy the cell walls of some kinds 
of bacteria.)  

Graphic Analysis: histogram, box plot, normal probability plot

SAS Product: Enterprise Guide; SAS/STAT

Size: 60 rows, 2 columns

----------------------------------------------------------------------- 
Name:  veneer

Analysis: nonparametric ANOVA

Reference:  Generated for SAS training course

Description:  Consider an experiment to investigate the durability of three 
brands of synthetic wood veneer. This type of veneer is often used in office 
furniture and on kitchen countertops. To determine durability, samples of each 
of the three brands were subjected to a friction test. The amount of veneer 
material that is worn away due to friction is measured. The resulting wear 
measurement is recorded for each sample. Brands that have small measurements are 
desireable. 

Graphic Analysis: box plot, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 30 rows, 2 columns

----------------------------------------------------------------------- 
Name:  WCGS (Western Collaborative Group Study)

Analysis: Survival Analysis

Reference:  Rosenman, R.H., Brand, R.J., Jenkins, C.D. et al. 1975 Coronary 
heart disease in the Western Collaborative Group study. Journal of the American 
Medical Association. 223: 872-877.

Description:  The data are from a prospective study of the occurrence of 
coronary events-usually heart attacks.  Covariates that may influence the risk 
of a coronary event include smoking, blood pressure history, and cholesterol 
level.  These data are from a group of 3,154 male employees from ten California 
companies during 1960-1961.  The original purpose of the study was to 
investigate the effects of behavior type and smoking habits on heart disease.  
After the recruitment, the study followed participants for nine years, although 
a few were lost to follow-up before the end of the study.  The time variable of 
interest was the interval from entry into the study until the appearance, as 
determined by a medical expert, of coronary heart disease.  

The dataset contains event time and censor variables for 614 participants, as 
well as measurements of two covariates of interest: smoking behavior at study 
entry and behavior type.  Individuals were classified into behavior types on the 
basis of an interview; in general terms, Type A behavior is characterized by 
aggressiveness and competitiveness, whereas Type B behavior is considered more 
relaxed and noncompetitive.  In this subsample, events were observed in 60 
individuals.

Graphic Analysis: bar charts, mosaic plots

SAS Product: Enterprise Guide; SAS/STAT

Size: 614 rows, 4 columns

----------------------------------------------------------------------- 
Name:  westernrates

Analysis: correlation

Reference:  Places Rated Almanac by Roger Boyer and David Savageau, Rand
McNally.

Description:  Different western cities are rated by nine criteria.  For all but 
two of the variables, the higher the score, the better.  For Housing and Crime, 
the lower the score, the better.  You want to determine whether there is a 
linear correlation between any two of the criteria.

Graphic Analysis: scatter plots, histogram

SAS Product: Enterprise Guide

Size: 52 rows, 11 columns

----------------------------------------------------------------------- 
Name:  Wine Tasting

Analysis: nonparametric ANOVA; measures of agreement.

Reference:  Collected for StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  In this experiment fifteen people rated six red wines.  Each wine 
was rated using criteria commonly used to judge wine quality.  The totals for 
each judge and wine were calculated.  You will determine whether there is a 
difference in the quality of the wines as determined by the judges.

This data could also be used to rank the wines and then determine if there is 
agreement among the judges in terms of the ranks of the wines.

Some data manipulation will be necessary to conduct either of these analyses.

Graphic Analysis: scatterplot, histogram

SAS Product: Enterprise Guide; SAS/STAT

Size: 15 rows, 7 columns

----------------------------------------------------------------------- 
Name:  writing

Analysis: ANCOVA

Reference:  Generated for StatView Reference, 2nd edition. 1998. SAS Institute Inc.

Description:  A university English department wants to know whether its first-
year composition course is as effective for history and math majors as it is for 
English majors.  We could do a simple analysis of variance with final class 
scores as the dependent variable and major as the factor.  However, students 
could have differing verbal abilities, and we must control for that by including 
their Verbal SAT scores as a covariate.

The main question is whether the course is equally effective for students of 
different majors.  Secondly, we want to estimate the average class score for 
students in each major.  Finally, we want to know whether SAT scores are 
effective for controlling for variability among individual students.

Graphic Analysis: box plot, histogram,

SAS Product: Enterprise Guide; SAS/STAT

Size: 19 rows, 3 columns
