


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Class: STATISTICAL METHODS IN GEOG; Subject: Geography; University: Kent State University; Term: Unknown 1993;
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!
Chapter 7: More on Regression - Page 1 It’s easy to begin explaining the concept of regression via simple linear (bivariate) regression, as we did last chapter. However, a more common type of regression we see is called multiple regression , which refers to the fact that we check how two or more independent variables affect some dependent variable. The concept of independent and dependent variables is the same, there’s just more than one independent one. The concept of the predictive equation is also the same, though it’s a bit longer: y ˆ^ a b 1 x 1 b 2 x 2 b 3 x 3 ... for as many independent variables (the x’s) as you have. Last time, we had two variables, and so graphically we fitted a line between them. If we have three variables (two independent ones), it’s like fitting a flat plane on a 3-D grid (Figure 7.1 in your book gives you a good idea about what this looks like.) Beyond this it can’t be visualized, but the principle is the same.
1. Multicollinearity So let’s say we have a dependent variable, and a hundred independent variables. Can we enter all hundred? Sure, but is it useful? Most likely not. The problem is an issue called multicollinearity, which means that the other independent variables themselves are highly correlated. Say we have summertime ice cream sales as our dependent variable, and 2 pm temperature and 3 pm temperature as our independent ones. If it’s hot at 2 pm it’s going to be also hot at 3 pm most likely, so what additional information does this provide? Not much. Worse however, is that when you include two very similar variables, the coefficients in the regression equation are often difficult to interpret, sometimes counterintuitive, and will fluctuate significantly with one additional or fewer observation. 2. What do the coefficients mean? In bivariate regression, it’s easier to interpret the coefficient, as there is only one independent variable. But with multiple coefficients, you have to be careful. Consider the example from the book. Considered separately, House price = -57,809 + 36.2 (Lot size)
Chapter 7: More on Regression - Page 2 House price = 103361 + 15580 (Number of bedrooms) But when considered together, House price = -1993 + 21.6 (Lot size) + 9333 (Bedrooms) What does this mean? Considering lot size, the house price goes up $21.60 per square foot, if the number of bedrooms is constant. Similarly, the house price goes up $9,333 per additional bedroom, on the same size property. If you consider only one of these in a regression equation, notice how the terms are larger for each of the independent variables.
3. Dummy variables If we want to include a qualitative variable into the regression equation, there are ways of doing this. Say you want to see if ambulance calls vary, among other things, by whether or not the day is a weekday or weekend. We can quantify this by using a dummy variable, where weekend=‘1’ if it’s a weekend and weekend=’0’ for weekday. An actual equation for the city of Toronto is: Ambulance calls = 445 + 3.1 (Temp. 5 PM) -11 (weekend). What this means is that for each degree increase in temperature there are 3.1 more ambulance calls; but in terms of the dummy variable, a weekday would have 11 more ambulance calls at the same temperature. 4. Selection of variables So how do we end up selecting which variables to include in a model. There’s a few different ways, we’ll describe two here and then also discuss the example in Section 7.4. One way is to include all variables. If you only have a few, and you’re sure they are not related to each other, this may work. Generally though, this isn’t recommended. The most common way is something called stepwise regression. Stepwise regression occurs (amazingly!) in steps, with one variable added or subtracted at a time. Recall how we can test the significance of a coefficient as we discussed last chapter. This procedure starts out testing
Chapter 7: More on Regression - Page 4 a bx a bx e e y (^) 1 . ˆ (^). This yields values that range between 0 and 1 along an S- shaped curve. While the equation produces useful results, it is rather difficult to interpret the coefficient.
7. Other regression issues Some other issues we’ll discuss: Autocorrelation What if one value is correlated with the next? This can happen spatially, or temporally. For instance, a hot day one day is more likely to be associated with a hot day the next. Not accounting for this can significantly bias a regression equation. To account for this a number of remedies can be used, including changing the variable to a differencing term, in other words the change in temperature from one day to the next. Serial correlation What about seasonal data? Something that you know varies on an annual cycle, say, may give you misleading results if you don’t account for this. We usually deal with this using running-means or standardizing the data to account for season. Lag effects Sometimes an observation of something on one day is better correlated with a dependent variable’s value on a later day, if there’s some sort of “lag effect”. For instance, pollution levels downwind may not increase the day much pollution is released, but rather the day after. We can easily run a regression equation with the dependent variable’s observations offset by a day or two to compensate.