Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Logistic Regression: Iterated Weighted Least Squares and Ordered Logit/Probit Models, Exercises of Programming Languages

Instructions for programming logistic regression using iterated weighted least squares (IWLS) and ordered logit/probit models in R. It includes explanations of the methods, programming hints, and problems to solve. The document also covers the concept of ordered logit and probit models, their梦境分布和对数似然函数, and how to calculate individual-category probabilities.

Typology: Exercises

2021/2022

Uploaded on 02/11/2022

carlick
carlick 🇺🇸

4.2

(11)

276 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Basic R Programming: Exercises
RProgramming
John Fox
ICPSR, Summer 2009
1. Logistic Regression: Iterated weighted least squares (IWLS) is a standard method of fitting
generalized linear models to data. As described in Section 5.5 of An R and S-PLUS Com-
panion to Applied Regression (Fox, 2002), the IWLS algorithm applied to binomial logistic
regression proceeds as follows:
(a) Set the regression coecients to initial values, such as β(0) =0(where the superscript
0 indicates start values).
(b) At each iteration tcalculate the current fitted probabilities μ, variance-function values
ν, working-response values z,andweightsw:
μ(t)
i=[1+exp(η(t)
i)]1
v(t)
i=μ(t)
i(1 μ(t)
i)
z(t)
i=η(t)
i+(yiμ(t)
i)/v(t)
i
w(t)
i=nivi
Here, nirepresents the binomial denominator for the ith observation; for binary data,
alloftheniare 1.
(c) Regress the working response on the predictors by weighted least squares, minimizing
the weighted residual sum of squares
n
X
i=1
w(t)
i(z(t)
ix0
iβ)2
where x0
iis the ith row of the model matrix.
(d) Repeat steps 2 and 3 until the regression coecients stabilize at the maximum-likelihood
estimator b
β.
(e) Calculate the estimated asymptotic covariance matrix of the coecients as
b
V(b
β)=(X0WX)1
where W=diag{wi}is the diagonal matrix of weights from the last iteration and Xis
the model matrix.
Problem: Program this method in R. The function that you define should take (at least)
three arguments: The model matrix X; the response vector of observed proportions y;and
1
pf3
pf4
pf5

Partial preview of the text

Download Logistic Regression: Iterated Weighted Least Squares and Ordered Logit/Probit Models and more Exercises Programming Languages in PDF only on Docsity!

Basic R Programming: Exercises

R Programming

John Fox

ICPSR, Summer 2009

  1. Logistic Regression: Iterated weighted least squares (IWLS) is a standard method of fitting generalized linear models to data. As described in Section 5.5 of An R and S-PLUS Com- panion to Applied Regression (Fox, 2002), the IWLS algorithm applied to binomial logistic regression proceeds as follows:

(a) Set the regression coefficients to initial values, such as β(0)^ = 0 (where the superscript 0 indicates start values). (b) At each iteration t calculate the current fitted probabilities μ, variance-function values ν, working-response values z, and weights w:

μ( i t) = [1 + exp(−η( it ))]−^1 v( i t) = μ( it )(1 − μ( i t)) z( i t) =^ η( it )+ (yi −^ μ( i t))/v i(t) w( i t) = nivi

Here, ni represents the binomial denominator for the ith observation; for binary data, all of the ni are 1. (c) Regress the working response on the predictors by weighted least squares, minimizing the weighted residual sum of squares X^ n

i=

w( i t)(z( i t)− x^0 iβ)^2

where x^0 i is the ith row of the model matrix. (d) Repeat steps 2 and 3 until the regression coefficients stabilize at the maximum-likelihood estimator βb. (e) Calculate the estimated asymptotic covariance matrix of the coefficients as

V^ b(βb) = (X^0 WX)−^1

where W = diag{wi} is the diagonal matrix of weights from the last iteration and X is the model matrix.

Problem: Program this method in R. The function that you define should take (at least) three arguments: The model matrix X; the response vector of observed proportions y; and

the vector of binomial denominators n. I suggest that you let n default to a vector of 1 s (i.e., for binary data, where y consists of 0 s and 1 s), and that you attach a column of 1 s to the model matrix for the regression constant so that the user does not have to do this when the function is called. Programming hints:

  • Adapt the structure of the example developed in Section 8.5.1 of “Writing Programs” (Fox and Weisberg, draft), but note that this example is for binary logistic regression, while the current exercise is to program the more general binomial logit model.
  • Use the lsfit function to get the weighted-least-squares fit, calling the function as lsfit(X, z, w, intercept=FALSE), where X is the model matrix; z is the current working response; and w is the current weight vector. The argument intercept=FALSE is needed because the model matrix already has a column of 1 s. The function lsfit returns a list, with element $coef containing the regression coefficients. See ?lsfit for details.
  • One tricky point is that lsfit requires that the weights (w) be a vector, while your calculation will probably produce a one-column matrix of weights. You can coerce the weights to a vector using the function as.vector.
  • Return a list with the maximum-likelihood estimates of the coefficients, the covariance matrix of the coefficients, and the number of iterations required.
  • You can test your function on the Mroz data in the car package, being careful to make all of the variables numeric. You might also try fitting a binomial (as opposed to binary) logit model.
  1. A Challenging Problem – Ordered Logit and Probit Models: Ordered logit and probit models are popular regression models for ordinal response variables; the ordered logit model is also called the proportional-odds model (see below for an explanation). The following description is adapted from Fox, Applied Regression Analysis and Generalized Linear Models, Second Edition (2008, Ch. 14): Imagine that there is a latent (i.e., unobservable) variable ξ that is a linear function of X’s plus a random error: ξi = α + β 1 Xi 1 + · · · + βkXik + εi The latent response ξ is dissected by m − 1 thresholds (i.e., boundaries) into m regions. Denoting the thresholds by α 1 < α 2 < · · · < αm− 1 , and the resulting response by Y , we observe

Yi =

1 if ξi ≤ α 1 2 if α 1 < ξi ≤ α 2 .. . m − 1 if αm− 2 < ξi ≤ αm− 1 m if αm− 1 < ξi

The thresholds, regions, and corresponding values of ξ and Y are represented graphically in the following figure. Notice that the thresholds are not in general uniformly spaced.

and thus

Pr(Yi > j) = Λ (αj + β 1 Xi 1 + · · · + βkXik) , j = 1,... , m − 1 (3) = Λ

αj +^ x^0 iβ

where Λ (·) is the cumulative logistic distribution. In this parametrization, the intercepts αj are the negatives of the category thresholds. The ordered probit model is similar with

Pr(Yi > j) = Φ (αj + β 1 Xi 1 + · · · + βkXik) , j = 1,... , m − 1 (5) = Φ

αj + x^0 iβ

where Φ (·) is the cumulative normal distribution.

The log-likelihood under both the ordered logit and ordered probit model takes the following form:

loge L(α, β) =

X^ n

i=

loge πw i 1 i 1 πw i 2 i 2 · · · πw imim

where α is an m− 1 × 1 vector containing all of the regression constants and β is the k× 1 vector containing the other regression coefficients; πij = Pr(Yi = j) (i.e., the probability under the model that individual i is in response category j); and the wij are indicator variables equal to 1 if individual i is observed in category j and 0 otherwise. Thus, for each individual, only one of the wij is equal to 1 and only the corresponding πij contributes to the likelihood. Note that for either the ordered logit model or the ordered probit model, the individual-category probabilities can be computed as differences between adjacent cumulative probabilities from Equation 3 or 5 (which are functions of the parameters):

πi 1 = 1 − Pr(Yi > 1) πi 2 = Pr(Yi > 1) − Pr(Yi > 2) .. . πim = Pr(Yi > m − 1)

Problem: Program the ordered-logit model or the ordered-probit model (or both). The function that you define should take (at least) two arguments: The model matrix X and the response vector y, which should be a factor or ordered factor; I suggest that you attach a column of 1 s to the model matrix for the regression constants so that the user does not have to do this when the function is called; the ordered logit and probit models always have constants. Your function can include an argument to indicate which model – logit or probit

  • is to be fit.

Programming hints:

  • The parameters consist of the intercepts and the other regression coefficients, say a and b. Although there are cleverer ways to proceed, you can set b to a vector of zeroes to start, and compute start values for a from the marginal distribution of the response; e.g., marg.p <- rev(cumsum(rev(table(y)/n)))[-1] a <- log(marg.p/(1 - marg.p)) Here y is the response vector and n is the number of observations.
  • If you’re fitting the ordered logit model, use the cumulative logistic distribution func- tion plogis(); if you’re fitting the ordered probit model, use the cumulative normal distribution function pnorm().
  • Use optim() to maximize the likelihood, treating the lreg2() function in Section 8.5. of “Writing Programs” (Fox and Weisberg, draft) as a model, but noting that for the ordered logit and probit models, I have shown only the log-likelihood and not the gra- dient.
  • Return a list with the maximum-likelihood estimates of the coefficients, including the intercepts or thresholds (negative of the intercepts); the covariance matrix of the coeffi- cients (obtained from the inverse-Hessian returned by optim()), the residual deviance for the model (i.e., minus twice the maximized log-likelihood), and an indication of whether or not the computation converged.
  • The ordered logit and probit models may be fit by the polr() function in the MASS package (one of the standard R packages). You can use polr() to verify that your function works properly. To test your program, you can use the WVS dataset in the effects package. For testing purposes, use a simple additive model rather than the model with interactions given in ?WVS.
  1. General Cumulative Logit and Probit Models: The ordered logit and probit models of the previous problem make the strong assumption that all m − 1 cumulative probabilities can be modeled with the same regression coefficients, except for different intercepts. More general versions of these models permit different regression coefficients:

Pr(Yi > j) = Λ

αj + β 1 j Xi 1 + · · · + βkj Xik

, j = 1,... , m − 1

or Pr(Yi > j) = Φ

αj + β 1 j Xi 1 + · · · + βkj Xik

, j = 1,... , m − 1 Program one or the other (or both) of these models. For your example regression, use a likelihood-ratio test to compare the more general cumulative logit or probit model to the more restrictive ordered logit or probit model of the preceding problem. This test checks the assumption of equal slopes. The cumulative logit and probit models (along with the ordered logit and probit models) can be fit by the vglm() function in the VGAM package.

  1. Numerical Linear Algebra: A matrix is said to be in reduced row-echelon form when it satisfies the following criteria:

(a) All of its nonzero rows (if any) precede all of its zero rows (if any). (b) The first entry (from left to right) – called the leading entry – in each nonzero row is

(c) The leading entry in each nonzero row after the first is to the right of the leading entry in the previous row. (d) All other entries are 0 in a column containing a leading entry.

A matrix can be put into reduced row echelon form by a sequence of elementary row opera- tions, which are of three types:

(a) Multiply each entry in a row by a nonzero constant.

The matrix is now in reduced row-echelon form. The rank of a matrix is the number of nonzero rows in its reduced row-echelon form, and so the matrix in this example is of rank 2. Problem: Write an R function to calculate the reduced row-echelon form of a matrix by elimination. Programming hints:

  • When you do “floating-point” arithmetic on a computer, there are almost always round- ing errors. One consequence is that you cannot rely on a number being exactly equal to a value such as 0. When you test that an element, say x, is 0, therefore, you should do so within a tolerance – e.g., |x| < 1 × 10 −^6.
  • The computations tend to be more accurate if the absolute values of the pivots are as large as possible. Consequently, you can exchange a row for a lower one to get a larger pivot even if the element in the pivot position is nonzero.
  1. A less difficult problem: Write a function to compute running medians. Running medians are a simple smoothing method usually applied to time-series. For example, for the numbers 7, 5, 2, 8, 5, 5, 9, 4, 7, 8, the running medians of length 3 are 5, 5, 5, 5, 5, 5, 7, 7. The first running median is the median of the three numbers 7, 5, and 2; the second running median is the median of 5, 2, and 8; and so on. Your function should take two arguments: the data (say, x), and the number of observations for each median (say, length). Notice that there are fewer running medians than observations. How many fewer?
  2. Simulation: Develop a simulation illustrating the central limit theorem for the mean: Almost regardless of the population distribution of X, the mean X of repeated samples of size n drawn from the population is approximately normally distributed, with mean E(X) = E(X) = μ, and variance V (X) = V (X)/n = σ^2 /n, and with the approximation improving as the sample size grows. Sample from a highly skewed distribution, such as the exponential distribution with a small “rate” parameter λ (e.g., λ = 1); use several different sample sizes, such as 1, 2, 5, 25, and 100, and draw many samples of each size, comparing the observed distribution of sample means with the approximating normal distribution. Exponential random variables may be generated in R using the rexp() function. Note that the mean of an exponential random variable is 1 /λ and its variance is 1 /λ^2.