Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Notes on More on Regression - Statistics Methods in Geography | GEOG 39002, Study notes of Data Analysis & Statistical Methods

Kent State University (KSU) - Ashtabula Campus Data Analysis & Statistical Methods

Material Type: Notes; Class: STATISTICAL METHODS IN GEOG; Subject: Geography; University: Kent State University; Term: Unknown 1993;

Typology: Study notes

2009/2010

Uploaded on 02/25/2010

koofers-user-78j 🇺🇸

10 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

Supplemental Notes for 39002: Statistical Methods in Geography (Sheridan)

Chapter 7: More on Regression - Page 1

It’s easy to begin explaining the concept of regression via simple linear

(bivariate) regression, as we did last chapter. However, a more common

type of regression we see is called multiple regression, which refers to the

fact that we check how two or more independent variables affect some

dependent variable.

The concept of independent and dependent variables is the same, there’s

just more than one independent one. The concept of the predictive

equation is also the same, though it’s a bit longer:

...

ˆ

332211

xbxbxbay 

for as many independent variables (the x’s) as you have. Last time, we had

two variables, and so graphically we fitted a line between them. If we have

three variables (two independent ones), it’s like fitting a flat plane on a 3-D

grid (Figure 7.1 in your book gives you a good idea about what this looks

like.) Beyond this it can’t be visualized, but the principle is the same.

1. Multicollinearity

So let’s say we have a dependent variable, and a hundred independent

variables. Can we enter all hundred? Sure, but is it useful? Most likely

not. The problem is an issue called multicollinearity, which means that the

other independent variables themselves are highly correlated.

Say we have summertime ice cream sales as our dependent variable, and

2 pm temperature and 3 pm temperature as our independent ones. If it’s

hot at 2 pm it’s going to be also hot at 3 pm most likely, so what additional

information does this provide? Not much. Worse however, is that when

you include two very similar variables, the coefficients in the regression

equation are often difficult to interpret, sometimes counterintuitive, and will

fluctuate significantly with one additional or fewer observation.

2. What do the coefficients mean?

In bivariate regression, it’s easier to interpret the coefficient, as there is only

one independent variable. But with multiple coefficients, you have to be

careful. Consider the example from the book. Considered separately,

House price = -57,809 + 36.2 (Lot size)

Partial preview of the text

Download Notes on More on Regression - Statistics Methods in Geography | GEOG 39002 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Chapter 7: More on Regression - Page 1 It’s easy to begin explaining the concept of regression via simple linear (bivariate) regression, as we did last chapter. However, a more common type of regression we see is called multiple regression , which refers to the fact that we check how two or more independent variables affect some dependent variable. The concept of independent and dependent variables is the same, there’s just more than one independent one. The concept of the predictive equation is also the same, though it’s a bit longer: y ˆ^  a  b 1 x 1  b 2 x 2  b 3 x 3 ... for as many independent variables (the x’s) as you have. Last time, we had two variables, and so graphically we fitted a line between them. If we have three variables (two independent ones), it’s like fitting a flat plane on a 3-D grid (Figure 7.1 in your book gives you a good idea about what this looks like.) Beyond this it can’t be visualized, but the principle is the same.

1. Multicollinearity So let’s say we have a dependent variable, and a hundred independent variables. Can we enter all hundred? Sure, but is it useful? Most likely not. The problem is an issue called multicollinearity, which means that the other independent variables themselves are highly correlated. Say we have summertime ice cream sales as our dependent variable, and 2 pm temperature and 3 pm temperature as our independent ones. If it’s hot at 2 pm it’s going to be also hot at 3 pm most likely, so what additional information does this provide? Not much. Worse however, is that when you include two very similar variables, the coefficients in the regression equation are often difficult to interpret, sometimes counterintuitive, and will fluctuate significantly with one additional or fewer observation. 2. What do the coefficients mean? In bivariate regression, it’s easier to interpret the coefficient, as there is only one independent variable. But with multiple coefficients, you have to be careful. Consider the example from the book. Considered separately, House price = -57,809 + 36.2 (Lot size)

Chapter 7: More on Regression - Page 2 House price = 103361 + 15580 (Number of bedrooms) But when considered together, House price = -1993 + 21.6 (Lot size) + 9333 (Bedrooms) What does this mean? Considering lot size, the house price goes up $21.60 per square foot, if the number of bedrooms is constant. Similarly, the house price goes up $9,333 per additional bedroom, on the same size property. If you consider only one of these in a regression equation, notice how the terms are larger for each of the independent variables.

3. Dummy variables If we want to include a qualitative variable into the regression equation, there are ways of doing this. Say you want to see if ambulance calls vary, among other things, by whether or not the day is a weekday or weekend. We can quantify this by using a dummy variable, where weekend=‘1’ if it’s a weekend and weekend=’0’ for weekday. An actual equation for the city of Toronto is: Ambulance calls = 445 + 3.1 (Temp. 5 PM) -11 (weekend). What this means is that for each degree increase in temperature there are 3.1 more ambulance calls; but in terms of the dummy variable, a weekday would have 11 more ambulance calls at the same temperature. 4. Selection of variables So how do we end up selecting which variables to include in a model. There’s a few different ways, we’ll describe two here and then also discuss the example in Section 7.4. One way is to include all variables. If you only have a few, and you’re sure they are not related to each other, this may work. Generally though, this isn’t recommended. The most common way is something called stepwise regression. Stepwise regression occurs (amazingly!) in steps, with one variable added or subtracted at a time. Recall how we can test the significance of a coefficient as we discussed last chapter. This procedure starts out testing

Chapter 7: More on Regression - Page 4 a bx a bx e e y (^)     1 . ˆ (^). This yields values that range between 0 and 1 along an S- shaped curve. While the equation produces useful results, it is rather difficult to interpret the coefficient.

7. Other regression issues Some other issues we’ll discuss: Autocorrelation What if one value is correlated with the next? This can happen spatially, or temporally. For instance, a hot day one day is more likely to be associated with a hot day the next. Not accounting for this can significantly bias a regression equation. To account for this a number of remedies can be used, including changing the variable to a differencing term, in other words the change in temperature from one day to the next. Serial correlation What about seasonal data? Something that you know varies on an annual cycle, say, may give you misleading results if you don’t account for this. We usually deal with this using running-means or standardizing the data to account for season. Lag effects Sometimes an observation of something on one day is better correlated with a dependent variable’s value on a later day, if there’s some sort of “lag effect”. For instance, pollution levels downwind may not increase the day much pollution is released, but rather the day after. We can easily run a regression equation with the dependent variable’s observations offset by a day or two to compensate.

Notes on More on Regression - Statistics Methods in Geography | GEOG 39002, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Notes on More on Regression - Statistics Methods in Geography | GEOG 39002 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!