







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The coefficient of determination, R2, is a statistical measure that indicates the proportion of the variability of a dependent variable (y) that can be explained by an independent variable (x) in a simple linear regression model. It ranges from 0 to 1, with higher values indicating a stronger linear relationship between x and y. the concept of R2, its calculation, and its interpretation, using examples and formulas.
Typology: Schemes and Mind Maps
1 / 13
This page cannot be seen from the preview
Don't miss anything!
Once we have decided that β is not zero, so that a linear relationship seems to exist between x and y , it is useful to measure the strength of this linear relationship. Such a measure is provided by the coefficient of determination , R^2.
To understand R^2 , note that one of the aims of regression analysis is to study the relationship between x and y , i.e., to try to use the value of x to "explain" y.
Recall the Salary vs. Height data.
65.0 67.5 70.0 72.5 75.0 77.
6750
6500
6250
6000
5750
5500
Height
Salary
S 192. R-Sq 71.4% R-Sq(adj) 70.3%
Fitted Line Plot for Salary vs. Height Salary = - 902.2 + 100.4 Height
If we look at the y 's (the salaries) as a data set, we note that they are not all the same; the y 's exhibit variability. A rough measure of this variability is the total sum of squares,
Note that SST is ( n −1) times the sample variance of the y 's.
If there is a linear relationship between x and y , then the variability of the y 's is not due entirely to chance fluctuations.
Instead, the fact that the salaries are different can be partially "explained" by the fact that the heights ( x ) are different. Of course, salary is not completely explained by height, so part of the variability in the salaries remains unexplained.
1
=
n
i
SST yi y
or SST = SSR + SSE.
Interpretation : The variability of the y 's ( SST ) can be broken into two parts, SSR + SSE.
This is simply ( n −1) times the sample variance of the fitted values.
1
2 1
2 1
2
= = =
n
i
i i
n
i
i
n
i
yi y y y y y
1
2
=
n
i
SSR yi y
Now, we define the coefficient of determination by
An equivalent definition is
We will get R^2 = 1 if, and only if, all points lie exactly on a straight (non-horizontal) line.
The closer R^2 is to 1, the stronger the linear relationship between x and y.
If R^2 is near zero, then almost none of the variability of y is explained by x , so the linear relationship is weak.
We will get R^2 = 0 if, and only if,
This can happen in a variety of ways, including: (1) All y 's lie on a horizontal line;
(2) The data points lie on a parabola y = a + b x^2 , which peaks in the middle of the range of the equally-spaced x 's.
If R^2 is large, we say that x and y are "highly correlated". In this case, there is a strong linear relationship between x and y.
If R^2 is near zero, we say that x and y are nearly "uncorrelated". In this case, the linear relationship is weak.
βˆ^ = 0.
Eg: Since 71.36% of the variability in salary is “explained” by height, the linear relationship is strong. Height is a good predictor of salary. The other 28.64% of the variability in salary is unexplained, but we could try to include more variables in our regression. This would definitely improve the R 2. (We will return to this point later.)
Eg : For the Stock Market example, the Minitab output shows that R^2 = 0.0506. Only 5.06% of the variability in Today's returns is "explained" by Yesterday's returns.
Although the linear relationship is statistically significant (low p - value), it is still quite weak (low R^2 ).
The forecast of today’s return based on yesterday’s return will not be very accurate.
Regression Analysis: Today versus Yesterday
Analysis of Variance
Source DF SS MS F-Value P-Value Regression 1 100.0 100.042 71.32 0. Error 1338 1876.7 1. Total 1339 1976.
Model Summary S R-sq R-sq(adj) 1.18433 5.06% 4.99%
Coefficients Term Coef SE Coef T-Value P-Value Constant 0.0846 0.0324 2.61 0. Yesterday -0.2249 0.0266 -8.45 0.
Regression Equation Today = 0.0846 - 0.2249 Yesterday
[R Demo: LeastSquaresFitWithRsquare]