Shrinkage, Adjusted R-Square - Statistic and Research Design II | PSY 862 | Study notes Statistics

Shrinkage: The choice of weights in a regression analysis is designed to yield the

highest possible correlation between the independent variables and the dependent

variable (R-square). That is, the multiple correlation can be expressed as the correlation

between the predicted scores based on the regression equation (Y predicted) and the

observed criterion scores (actual Y). Since you pick the weights to produce the largest

possible correlation, the regression equation may be “overfitted” to idiosyncrasies of the

sample data. If your development sample has some unusual or atypical individuals, they

may pull the regression equation off. If you drew another sample, that didn’t contain

these outliers, the results could be quite different. It can be demonstrated that when a

regression equation developed on one sample is applied to a new sample, the resulting R-

square is almost always smaller than the R-square obtained with the original development

sample. This phenomenon is called shrinkage.

Dealing with Shrinkage:

There are a couple of ways to deal with Shrinkage in a Multiple Regression Analysis:

Adjusted R-Square:

R-square can be interpreted as the percent of criterion variance

accounted for by the linear combination of the predictors. The sample multiple correlation and

the squared multiple correlation are biased estimates of their corresponding population values.

The sample R-square typically overestimates the population R-square and needs to be adjusted

downward. The adjusted R-square reported by SPSS makes the adjustment assuming a fixed

effects model. There are several other formulas for computing the adjusted R-square depending

on which assumptions are made.

Cross-Validation: Probably the best method for estimating the degree of shrinkage is to perform

a cross-validation. This is done by using two samples for which both the predictor variables and

the criterion or outcome variable are known. The first sample is the development sample the

second sample is called the calibration sample. The steps are as follows:

1. For the first sample a regular regression analysis is performed, and R-square and the

regression equation are calculated.

2. The regression equation is then applied to the predictor variables of the second sample

yielding a Y-predicted for each subject.

3. The correlation between the predicted scores and the observed criterion scores in the

calibration sample is computed.

4. The difference between R-squared obtained from the development sample and the R-

squared obtained from the calibration sample is an estimate of the amount of shrinkage.

If the shrinkage is small, the regression is considered valid.

5. Normally the two samples are then combined to produce one large development sample

and the final regression equation is computed.

Partial preview of the text

Download Shrinkage, Adjusted R-Square - Statistic and Research Design II | PSY 862 and more Study notes Statistics in PDF only on Docsity!

Shrinkage: The choice of weights in a regression analysis is designed to yield the

highest possible correlation between the independent variables and the dependent

variable (R-square). That is, the multiple correlation can be expressed as the correlation

between the predicted scores based on the regression equation (Y predicted) and the

observed criterion scores (actual Y). Since you pick the weights to produce the largest

possible correlation, the regression equation may be “overfitted” to idiosyncrasies of the

sample data. If your development sample has some unusual or atypical individuals, they

may pull the regression equation off. If you drew another sample, that didn’t contain

these outliers, the results could be quite different. It can be demonstrated that when a

regression equation developed on one sample is applied to a new sample, the resulting R-

square is almost always smaller than the R-square obtained with the original development

sample. This phenomenon is called shrinkage.

Dealing with Shrinkage:

There are a couple of ways to deal with Shrinkage in a Multiple Regression Analysis:

Adjusted R-Square: R-square can be interpreted as the percent of criterion variance

accounted for by the linear combination of the predictors. The sample multiple correlation and the squared multiple correlation are biased estimates of their corresponding population values. The sample R-square typically overestimates the population R-square and needs to be adjusted downward. The adjusted R-square reported by SPSS makes the adjustment assuming a fixed effects model. There are several other formulas for computing the adjusted R-square depending on which assumptions are made.

Cross-Validation: Probably the best method for estimating the degree of shrinkage is to perform a cross-validation. This is done by using two samples for which both the predictor variables and the criterion or outcome variable are known. The first sample is the development sample the second sample is called the calibration sample. The steps are as follows:

For the first sample a regular regression analysis is performed, and R-square and the regression equation are calculated.
The regression equation is then applied to the predictor variables of the second sample yielding a Y-predicted for each subject.
The correlation between the predicted scores and the observed criterion scores in the calibration sample is computed.
The difference between R-squared obtained from the development sample and the R- squared obtained from the calibration sample is an estimate of the amount of shrinkage. If the shrinkage is small, the regression is considered valid.
Normally the two samples are then combined to produce one large development sample and the final regression equation is computed.

Shrinkage, Adjusted R-Square - Statistic and Research Design II | PSY 862, Study notes of Statistics

Related documents

Partial preview of the text

Download Shrinkage, Adjusted R-Square - Statistic and Research Design II | PSY 862 and more Study notes Statistics in PDF only on Docsity!

Shrinkage: The choice of weights in a regression analysis is designed to yield the

highest possible correlation between the independent variables and the dependent

variable (R-square). That is, the multiple correlation can be expressed as the correlation

between the predicted scores based on the regression equation (Y predicted) and the

observed criterion scores (actual Y). Since you pick the weights to produce the largest

possible correlation, the regression equation may be “overfitted” to idiosyncrasies of the

sample data. If your development sample has some unusual or atypical individuals, they

may pull the regression equation off. If you drew another sample, that didn’t contain

these outliers, the results could be quite different. It can be demonstrated that when a

regression equation developed on one sample is applied to a new sample, the resulting R-

square is almost always smaller than the R-square obtained with the original development

sample. This phenomenon is called shrinkage.

Dealing with Shrinkage:

There are a couple of ways to deal with Shrinkage in a Multiple Regression Analysis:

Adjusted R-Square: R-square can be interpreted as the percent of criterion variance