




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Assignment; Class: Econometrics; Subject: Economics; University: Notre Dame; Term: Unknown 1989;
Typology: Assignments
1 / 8
This page cannot be seen from the preview
Don't miss anything!
Suggested Answers for Problem Set 6 ECON 30331
Dan Hungerman
A. Write down the formula for the OLS estimate of b 1
(^1 )
i^ i^ i
i i
x x y y b x x
B. Using your answer in part A, explain what would happen if you instead estimated the equation
and then do OLS on the transformed data. Does this affect the OLS estimates bˆ 1 or bˆ 0?
(^1 2 2 2 )
(^2 )
2 2 2 2 2 1
i i^ i^ i i^ i^ i i^ i
i i^ i i^ i i
i i^ i^ i i^ i^ i i^ i
i i^ i i^ i i
x x y y x x y y x x y y b x x x x x x
x x y y x x y y x x y y b x x x x x x
affect our estimate of b 0 , however. The first-order condition for bˆ 0 is bˆ 0 = y − b xˆ 1. With the
transformed data this becomes
0 1 1 1 1 0
C. Your study partner Mr. Silly is now confused, because when studying WLS in class we did some similar transformations of variables and got something different than what we find in part B above. Why is the answer to part B above different from what we discussed in class?
This is different from WLS for at least two reasons. First, in WLS we typically multiply the data by a variable, not just a constant. Second, when we estimated WLS & multiply terms in the regression equation by a variable, we multiplied all terms in the regression by a variable, including the constant. Here, we did not do that.
D. Would your answer for B change if this were a multivariate regression? That is, if there were
happen to our OLS estimates?
The answer in part B would still be correct—the coefficients for our x variables would be
the multivariate solution:
0 0 1 1
0 1 1
i^ i^ i^ k^ ik
j (^) i ij i i k ik
for b y b b x b x
for b x y b b x b x j
Now consider a solution analogous to the one in part B—the coefficients for the x variables are
transformed data) would look like this:
0 0 1 1
0 1 1
i^ i^ i^ k^ ik
j (^) i ij i i k ik
for b y b b x b x
for b x y b b x b x
But these can just be rewritten like this:
0 0 1 1 0 1 1
2 2 0 1 1 0 1 1
i i^ i^ k^ ik^ i i^ i^ k^ ik
j (^) i ij i i k ik (^) j ij i i k ik
for b y b b x b x y b b x b x
for b x y b b x b x x y b b x b x
These are the same first order conditions that we solved before we transformed the data. Thus the
(analytic weights assumed) (sum of wgt is 5.3644e-02)
Source | SS df MS Number of obs = 114 -------------+------------------------------ F( 3, 110) = 9. Model | 230479.881 3 76826.6271 Prob > F = 0. Residual | 908079.872 110 8255.27157 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 1138559.75 113 10075.75 Root MSE = 90.
price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bedrooms | 16.84739 12.28053 1.37 0.173 -7.489728 41. age | -.0786841 .2701835 -0.29 0.771 -.6141244. sq_feet | .0982406 .024969 3.93 0.000 .0487579. _cons | 104.8052 41.59718 2.52 0.013 22.36938 187.
makes and pop is the population of the country where they live. Further, suppose that for person i living in country c, income depends on pop in the following way:
incomei = α 0 + α 1 popc +ui
where
2 ui ∼ N(0, σu).
don’t have individual-level data. All you can find is a country-level dataset, which reports for
each country average income in the population, income (^) c, and the population popc. So you
will have to estimate:
What is the variance of u c? (Note that u^ cis an average of ui , which is averaged over popc
unique observations. You can assume the covariance between any two ui^ variables is zero.)
Here, u cis the average of income in the population: pop
i^ i c
u u =
. It is like we multiply
each ui by 1/pop and then we add all the ui terms together; if pop is a million then we have a
million of them to add together. If we multiply ui by 1/pop , the product has variance
2 2 σ (^) u/ pop.
So we have pop different terms to add together, and each one has variance
2 2 σ (^) u/ pop. If you
add some independent variables together (where the covariance is zero), the variance of their
sum is just the sum of their variances. So if there were 5 observations, the variance of uc
would be
2 2 5 σ (^) u/ pop.
Here, the number of terms we are adding together is the number of people in the population,
or pop. The variance of uc will be
2 2 2 pop* σ (^) u / pop = σu/ pop.
B. Suppose you wanted to transform the equation in part A to perform a weighted-least-squares
regression where the error term was appropriately homoskedastic. How could you transform
the equation in part A?
You would estimate:
If you multiply u (^) c by popc the resulting variance is simply
2
C. What would you type into Stata to perform the regression in part B? Could you just type,
“regress income pop”?
No, that would be zany! Instead we would type this:
regress income pop [weight = pop]
Or even smarter, just to be super conservative:
regress income pop [weight = pop], robust
weight of a pregnant woman at the end of her pregnancy, mommyw*. Such an estimation might
be useful if a doctor would want to estimate how large a fetus is, based on a pregnant woman’s
weight.
So the equation we want to estimate is this, which we’ll call equation (1)
babyw = a 0 + a mommyw 1 *+e (1)
Let’s suppose that in our dataset we perfectly observe baby’s birthweight, but that mother’s
weight is reported by the women themselves and hence measured with error. So the regression
we actually run is equation (2):
babyw = a 0 + a mommyw 1 +u (2)
True mother’s weight mommyw *is related to the observed mommyw as follows:
mommyw* = b mommy 1 + v (3)
C. Suppose b 1 = 1. What is the expected value of the OLS estimate aˆ 1? (We did something
similar like this back in September when we showed that OLS was unbiased, so your notes
from back then may be helpful. Also, your answer might depend on the properties of the
unobserables in equation (2) & you might want to think about that.)
Now our answer becomes:
2 2 1 1 ˆ (^1 2 2 2 ) i i^ i^ i^ i i^ i i^ i^ i i^ i
i i^ i i^ i i^ i i
a d d u a d d u d u a a d d d d
= = + = +
So, given the womenw values (and thus given di ) the expected value of aˆ 1 will be
(^1 1 )
i i^ i
i i
d u womenw a womenw a d
Looking back on equation (2) we can see that the unobservable there can be written as
ui = a v 1 i + ei. We’ll assume as usual that the error term e has no relation to womenw or
womenw*. But the error term v could. If E( vi ( womenwi − womenw)) ≠ 0 , then OLS is
biased.
Note this discussion has focused on bias, not (as in class) consistency. But clearly
measurement error affects OLS’ unbiased property and as with consistency this depends
upon the correlation of the measurement error with the observed values of the x variables.
y = β 0 + β 1 x *+e
But instead of observing the true x* you observe x = x *+v. So the regression you really estimate is
y = β 0 + β 1 x *+u
Let v ∼ N(0,1)and assume x* ∼ N(0,1), and also assume that x* and v are independent (and
thus have zero covariance). Suppose that the true value of β 1 was 4. What is the probability
u?
We saw in class that
1
1
2
(^1 1 2 ) plim ˆ x
x v
, where the denominator represent the covariance of
the observed x and the error term in the regression that is run. From the question we know that
1
2 1 x
σ = ,
2 σ (^) v = 1 , and β 1 = 4. Thus the OLS estimate of β 1 is 4*(1/2) = 2.
The term u can be expressed as e − β 1 v. If we assume that cov( ,e x ) = 0 , then
cov( ,x e − β 1 v) = −β 1 cov( , )x v. Since x = x +v and by assumption x and v are independent,
this becomes
2 − β 1 cov( , )x v = − β 1 cov( x * + v v, ) = −β 1 var( )v = − β σ 1 v. Plugging in we see that
the covariance is 4.
variable, but uncorrelated with what is observed, than you must have a serious problem since the true x variable is what you care about. Does Mr. Silly’s reasoning make sense or is he being silly again?
Mr. Silly is wrong again. The key is weather measurement error is correlated with what you actually use in the regression. Measurement error that is correlated only with the true x variable, x*, is not a serious concern. (Measurement error in the dependent variable is also not as big a concern.) What matters is what is going on with the variables you are actually using to make your estimates. Unfortunately, it is often the case that in the presence of measurement error we would expect the nature of the error to be different depending on the values of x we observe, and so we would then have to worry about attenuation bias.