Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Shrinkage, Adjusted R-Square - Statistic and Research Design II | PSY 862, Study notes of Statistics

Material Type: Notes; Class: Statistic & Research Design II; Subject: PSY Psychology; University: Eastern Kentucky University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-e2c
koofers-user-e2c 🇺🇸

4

(1)

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Shrinkage: The choice of weights in a regression analysis is designed to yield the
highest possible correlation between the independent variables and the dependent
variable (R-square). That is, the multiple correlation can be expressed as the correlation
between the predicted scores based on the regression equation (Y predicted) and the
observed criterion scores (actual Y). Since you pick the weights to produce the largest
possible correlation, the regression equation may be “overfitted” to idiosyncrasies of the
sample data. If your development sample has some unusual or atypical individuals, they
may pull the regression equation off. If you drew another sample, that didn’t contain
these outliers, the results could be quite different. It can be demonstrated that when a
regression equation developed on one sample is applied to a new sample, the resulting R-
square is almost always smaller than the R-square obtained with the original development
sample. This phenomenon is called shrinkage.
Dealing with Shrinkage:
There are a couple of ways to deal with Shrinkage in a Multiple Regression Analysis:
Adjusted R-Square:
R-square can be interpreted as the percent of criterion variance
accounted for by the linear combination of the predictors. The sample multiple correlation and
the squared multiple correlation are biased estimates of their corresponding population values.
The sample R-square typically overestimates the population R-square and needs to be adjusted
downward. The adjusted R-square reported by SPSS makes the adjustment assuming a fixed
effects model. There are several other formulas for computing the adjusted R-square depending
on which assumptions are made.
Cross-Validation: Probably the best method for estimating the degree of shrinkage is to perform
a cross-validation. This is done by using two samples for which both the predictor variables and
the criterion or outcome variable are known. The first sample is the development sample the
second sample is called the calibration sample. The steps are as follows:
1. For the first sample a regular regression analysis is performed, and R-square and the
regression equation are calculated.
2. The regression equation is then applied to the predictor variables of the second sample
yielding a Y-predicted for each subject.
3. The correlation between the predicted scores and the observed criterion scores in the
calibration sample is computed.
4. The difference between R-squared obtained from the development sample and the R-
squared obtained from the calibration sample is an estimate of the amount of shrinkage.
If the shrinkage is small, the regression is considered valid.
5. Normally the two samples are then combined to produce one large development sample
and the final regression equation is computed.

Partial preview of the text

Download Shrinkage, Adjusted R-Square - Statistic and Research Design II | PSY 862 and more Study notes Statistics in PDF only on Docsity!

Shrinkage: The choice of weights in a regression analysis is designed to yield the

highest possible correlation between the independent variables and the dependent

variable (R-square). That is, the multiple correlation can be expressed as the correlation

between the predicted scores based on the regression equation (Y predicted) and the

observed criterion scores (actual Y). Since you pick the weights to produce the largest

possible correlation, the regression equation may be “overfitted” to idiosyncrasies of the

sample data. If your development sample has some unusual or atypical individuals, they

may pull the regression equation off. If you drew another sample, that didn’t contain

these outliers, the results could be quite different. It can be demonstrated that when a

regression equation developed on one sample is applied to a new sample, the resulting R-

square is almost always smaller than the R-square obtained with the original development

sample. This phenomenon is called shrinkage.

Dealing with Shrinkage:

There are a couple of ways to deal with Shrinkage in a Multiple Regression Analysis:

Adjusted R-Square: R-square can be interpreted as the percent of criterion variance

accounted for by the linear combination of the predictors. The sample multiple correlation and the squared multiple correlation are biased estimates of their corresponding population values. The sample R-square typically overestimates the population R-square and needs to be adjusted downward. The adjusted R-square reported by SPSS makes the adjustment assuming a fixed effects model. There are several other formulas for computing the adjusted R-square depending on which assumptions are made.

Cross-Validation: Probably the best method for estimating the degree of shrinkage is to perform a cross-validation. This is done by using two samples for which both the predictor variables and the criterion or outcome variable are known. The first sample is the development sample the second sample is called the calibration sample. The steps are as follows:

  1. For the first sample a regular regression analysis is performed, and R-square and the regression equation are calculated.
  2. The regression equation is then applied to the predictor variables of the second sample yielding a Y-predicted for each subject.
  3. The correlation between the predicted scores and the observed criterion scores in the calibration sample is computed.
  4. The difference between R-squared obtained from the development sample and the R- squared obtained from the calibration sample is an estimate of the amount of shrinkage. If the shrinkage is small, the regression is considered valid.
  5. Normally the two samples are then combined to produce one large development sample and the final regression equation is computed.