Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding the Coefficient of Determination (R2) in Simple Linear Regression, Schemes and Mind Maps of Statistics

The coefficient of determination, R2, is a statistical measure that indicates the proportion of the variability of a dependent variable (y) that can be explained by an independent variable (x) in a simple linear regression model. It ranges from 0 to 1, with higher values indicating a stronger linear relationship between x and y. the concept of R2, its calculation, and its interpretation, using examples and formulas.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/27/2022

lalitdiya
lalitdiya 🇺🇸

4.3

(25)

240 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
19. SIMPLE LINEAR REGRESSION IV
The Coefficient of Determination, R2
Once we have decided that βis not zero, so that a linear
relationship seems to exist between xand y, it is useful to measure
the strength of this linear relationship. Such a measure is provided
by the coefficient of determination, R2.
To understand R2, note that one of the aims of regression analysis
is to study the relationship between xand y, i.e., to try to use the
value of xto "explain" y.
Keep in mind, though, that this "explanation" may not be one
of cause and effect.
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Understanding the Coefficient of Determination (R2) in Simple Linear Regression and more Schemes and Mind Maps Statistics in PDF only on Docsity!

19. SIMPLE LINEAR REGRESSION IV

The Coefficient of Determination, R^2

Once we have decided that β is not zero, so that a linear relationship seems to exist between x and y , it is useful to measure the strength of this linear relationship. Such a measure is provided by the coefficient of determination , R^2.

To understand R^2 , note that one of the aims of regression analysis is to study the relationship between x and y , i.e., to try to use the value of x to "explain" y.

  • Keep in mind, though, that this "explanation" may not be one of cause and effect.

Recall the Salary vs. Height data.

65.0 67.5 70.0 72.5 75.0 77.

6750

6500

6250

6000

5750

5500

Height

Salary

S 192. R-Sq 71.4% R-Sq(adj) 70.3%

Fitted Line Plot for Salary vs. Height Salary = - 902.2 + 100.4 Height

If we look at the y 's (the salaries) as a data set, we note that they are not all the same; the y 's exhibit variability. A rough measure of this variability is the total sum of squares,

Note that SST is ( n −1) times the sample variance of the y 's.

If there is a linear relationship between x and y , then the variability of the y 's is not due entirely to chance fluctuations.

Instead, the fact that the salaries are different can be partially "explained" by the fact that the heights ( x ) are different. Of course, salary is not completely explained by height, so part of the variability in the salaries remains unexplained.

( )^2.

1

=

n

i

SST yi y

  • Interestingly, the variability in salaries can be broken into two parts, the first attributed to differences in height, and the second attributed to other factors not yet accounted for.
  • We have the following important formula:

or SST = SSR + SSE.

Interpretation : The variability of the y 's ( SST ) can be broken into two parts, SSR + SSE.

  • The first part is the regression sum of squares,

This is simply ( n −1) times the sample variance of the fitted values.

1

2 1

2 1

2

= = =

n

i

i i

n

i

i

n

i

yi y y y y y

1

2

=

n

i

SSR yi y

Now, we define the coefficient of determination by

  • We see that R^2 measures the proportion of the variability of y that is "explained" by x.

An equivalent definition is

  • It can be shown that 0 ≤ R^2 ≤ 1.

We will get R^2 = 1 if, and only if, all points lie exactly on a straight (non-horizontal) line.

The closer R^2 is to 1, the stronger the linear relationship between x and y.

If R^2 is near zero, then almost none of the variability of y is explained by x , so the linear relationship is weak.

SST

SSR

R =

SST

SSE

R = −

We will get R^2 = 0 if, and only if,

This can happen in a variety of ways, including: (1) All y 's lie on a horizontal line;

(2) The data points lie on a parabola y = a + b x^2 , which peaks in the middle of the range of the equally-spaced x 's.

  • Note that in (2), there is a clear nonlinear relationship but no linear relationship whatsoever! So keep in mind that R^2 only measures the strength of the linear relationship.

If R^2 is large, we say that x and y are "highly correlated". In this case, there is a strong linear relationship between x and y.

If R^2 is near zero, we say that x and y are nearly "uncorrelated". In this case, the linear relationship is weak.

βˆ^ = 0.

Eg: Since 71.36% of the variability in salary is “explained” by height, the linear relationship is strong. Height is a good predictor of salary. The other 28.64% of the variability in salary is unexplained, but we could try to include more variables in our regression. This would definitely improve the R 2. (We will return to this point later.)

Eg : For the Stock Market example, the Minitab output shows that R^2 = 0.0506. Only 5.06% of the variability in Today's returns is "explained" by Yesterday's returns.

Although the linear relationship is statistically significant (low p - value), it is still quite weak (low R^2 ).

The forecast of today’s return based on yesterday’s return will not be very accurate.

Regression Analysis: Today versus Yesterday

Analysis of Variance

Source DF SS MS F-Value P-Value Regression 1 100.0 100.042 71.32 0. Error 1338 1876.7 1. Total 1339 1976.

Model Summary S R-sq R-sq(adj) 1.18433 5.06% 4.99%

Coefficients Term Coef SE Coef T-Value P-Value Constant 0.0846 0.0324 2.61 0. Yesterday -0.2249 0.0266 -8.45 0.

Regression Equation Today = 0.0846 - 0.2249 Yesterday

[R Demo: LeastSquaresFitWithRsquare]