Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Understanding the Coefficient of Determination (R2) in Simple Linear Regression, Schemes and Mind Maps of Statistics

The University of Montana (UM)Statistics

The coefficient of determination, R2, is a statistical measure that indicates the proportion of the variability of a dependent variable (y) that can be explained by an independent variable (x) in a simple linear regression model. It ranges from 0 to 1, with higher values indicating a stronger linear relationship between x and y. the concept of R2, its calculation, and its interpretation, using examples and formulas.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/27/2022

lalitdiya 🇺🇸

4.3

(25)

240 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

19. SIMPLE LINEAR REGRESSION IV

The Coefficient of Determination, R2

Once we have decided that βis not zero, so that a linear

relationship seems to exist between xand y, it is useful to measure

the strength of this linear relationship. Such a measure is provided

by the coefficient of determination, R2.

To understand R2, note that one of the aims of regression analysis

is to study the relationship between xand y, i.e., to try to use the

value of xto "explain" y.

•Keep in mind, though, that this "explanation" may not be one

of cause and effect.

Partial preview of the text

Download Understanding the Coefficient of Determination (R2) in Simple Linear Regression and more Schemes and Mind Maps Statistics in PDF only on Docsity!

19. SIMPLE LINEAR REGRESSION IV

The Coefficient of Determination, R^2

Once we have decided that β is not zero, so that a linear relationship seems to exist between x and y , it is useful to measure the strength of this linear relationship. Such a measure is provided by the coefficient of determination , R^2.

To understand R^2 , note that one of the aims of regression analysis is to study the relationship between x and y , i.e., to try to use the value of x to "explain" y.

Keep in mind, though, that this "explanation" may not be one of cause and effect.

Recall the Salary vs. Height data.

65.0 67.5 70.0 72.5 75.0 77.

6750

6500

6250

6000

5750

5500

Height

Salary

S 192. R-Sq 71.4% R-Sq(adj) 70.3%

Fitted Line Plot for Salary vs. Height Salary = - 902.2 + 100.4 Height

If we look at the y 's (the salaries) as a data set, we note that they are not all the same; the y 's exhibit variability. A rough measure of this variability is the total sum of squares,

Note that SST is ( n −1) times the sample variance of the y 's.

If there is a linear relationship between x and y , then the variability of the y 's is not due entirely to chance fluctuations.

Instead, the fact that the salaries are different can be partially "explained" by the fact that the heights ( x ) are different. Of course, salary is not completely explained by height, so part of the variability in the salaries remains unexplained.

( )^2.

SST yi y

Interestingly, the variability in salaries can be broken into two parts, the first attributed to differences in height, and the second attributed to other factors not yet accounted for.
We have the following important formula:

or SST = SSR + SSE.

Interpretation : The variability of the y 's ( SST ) can be broken into two parts, SSR + SSE.

The first part is the regression sum of squares,

This is simply ( n −1) times the sample variance of the fitted values.

2 1

= = =

i i

yi y y y y y

SSR yi y

Now, we define the coefficient of determination by

We see that R^2 measures the proportion of the variability of y that is "explained" by x.

An equivalent definition is

It can be shown that 0 ≤ R^2 ≤ 1.

We will get R^2 = 1 if, and only if, all points lie exactly on a straight (non-horizontal) line.

The closer R^2 is to 1, the stronger the linear relationship between x and y.

If R^2 is near zero, then almost none of the variability of y is explained by x , so the linear relationship is weak.

SST

SSR

R =

SST

SSE

R = −

We will get R^2 = 0 if, and only if,

This can happen in a variety of ways, including: (1) All y 's lie on a horizontal line;

(2) The data points lie on a parabola y = a + b x^2 , which peaks in the middle of the range of the equally-spaced x 's.

Note that in (2), there is a clear nonlinear relationship but no linear relationship whatsoever! So keep in mind that R^2 only measures the strength of the linear relationship.

If R^2 is large, we say that x and y are "highly correlated". In this case, there is a strong linear relationship between x and y.

If R^2 is near zero, we say that x and y are nearly "uncorrelated". In this case, the linear relationship is weak.

βˆ^ = 0.

Eg: Since 71.36% of the variability in salary is “explained” by height, the linear relationship is strong. Height is a good predictor of salary. The other 28.64% of the variability in salary is unexplained, but we could try to include more variables in our regression. This would definitely improve the R 2. (We will return to this point later.)

Eg : For the Stock Market example, the Minitab output shows that R^2 = 0.0506. Only 5.06% of the variability in Today's returns is "explained" by Yesterday's returns.

Although the linear relationship is statistically significant (low p - value), it is still quite weak (low R^2 ).

The forecast of today’s return based on yesterday’s return will not be very accurate.

Regression Analysis: Today versus Yesterday

Analysis of Variance

Source DF SS MS F-Value P-Value Regression 1 100.0 100.042 71.32 0. Error 1338 1876.7 1. Total 1339 1976.

Model Summary S R-sq R-sq(adj) 1.18433 5.06% 4.99%

Coefficients Term Coef SE Coef T-Value P-Value Constant 0.0846 0.0324 2.61 0. Yesterday -0.2249 0.0266 -8.45 0.

Regression Equation Today = 0.0846 - 0.2249 Yesterday

[R Demo: LeastSquaresFitWithRsquare]

Understanding the Coefficient of Determination (R2) in Simple Linear Regression, Schemes and Mind Maps of Statistics

Related documents

Partial preview of the text

Download Understanding the Coefficient of Determination (R2) in Simple Linear Regression and more Schemes and Mind Maps Statistics in PDF only on Docsity!

19. SIMPLE LINEAR REGRESSION IV

The Coefficient of Determination, R^2

( )^2.

SST

SSR

R =

SST

SSE

R = −