





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The Weak Law of Large Numbers, The Central Limit Theorem
Typology: Lecture notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!
Probability Theory includes various theorems known as Laws of Large Numbers ; for instance, see [Fel68, Hea71, Ros89]. Usually two major categories are distin- guished: Weak Laws versus Strong Laws. Within these categories there are numer- ous subtle variants of differing generality. Also the Central Limit Theorems are often brought up in this context. Many introductory probability texts treat this topic superficially, and more than once their vague formulations are misleading or plainly wrong. In this note, we consider a special case to clarify the relationship between the Weak and Strong Laws. The reason for doing so is that I have not been able to find a concise formal exposition all in one place. The material presented here is certainly not new and was gleaned from many sources. In the following sections, X 1 , X 2 ,... is a sequence of independent and identi- cally distributed random variables with finite expectation μ. We define the associ- ated sequence X ¯ (^) i of partial sample means by
X^ ¯ (^) n = 1 n
∑ n
i = 1
X (^) i.
The Laws of Large Numbers make statements about the convergence of X ¯ (^) n to μ. Both laws relate bounds on sample size, accuracy of approximation, and degree of confidence. The Weak Laws deal with limits of probabilities involving X ¯ (^) n. The Strong Laws deal with probabilities involving limits of X ¯ (^) n. Especially the math- ematical underpinning of the Strong Laws requires a careful approach ([Hea71, Ch. 5] is an accessible presentation).
Let’s not beat about the bush. Here is what the Weak Law says about convergence of X ¯ (^) n to μ.
2.1 Theorem ( Weak Law of Large Numbers ) We have
∀ε> 0 lim n →∞ Pr
| ¯ X (^) n − μ| ≤ ε
This is often abbreviated to
X^ ¯ (^) n → P μ as n → ∞
or in words: X ¯ (^) n converges in probability to μ as n → ∞.
On account of the definition of limit and the fact that probabilities are at most 1, Equation (1) can be rewritten as
∀ε> 0 ∀δ> 0 ∃ N > 0 ∀ n ≥ N Pr
| ¯ X (^) n − μ| ≤ ε
≥ 1 − δ. (2)
The proof of the Weak Law is easy when the X (^) i ’s have a finite variance. It is most often based on Chebyshev’s Inequality.
2.2 Theorem ( Chebyshev’s Inequality ) Let X be a random variable with finite mean μ and finite variance σ 2. Then we have
Pr (| X − μ| ≥ a ) ≤ σ 2 a^2
for all a > 0.
A slightly different way of putting it is this: For all a > 0, we have
Pr (| X − μ| ≥ a σ ) ≤
a^2
Thus, the probability that X deviates from its expected value by at least k standard deviations is at most 1/ k^2. Chebyshev’s Inequality is sharp when no further as- sumptions are made about X ’s distribution, but for practical applications it is often too sloppy. For example, the probability that X remains within 3σ of μ is at least 89 , no matter what distribution X has. However, when X is known to have a normal distribution, this probability in fact exceeds 0.9986. We now prove the Weak Law when the variance is finite. Let σ 2 be the variance of each X (^) i. In that case, we have E X ¯ (^) n = μ and Var X ¯ (^) n = σ 2 / n. Let ε > 0. Substituting X , μ, σ, a := ¯ X (^) n , μ, σ/
n , ε in Chebyshev’s Inequality then yields
Pr
| ¯ X (^) n − μ| ≥ ε
σ 2 n ε^2
Hence, for δ > 0 and for all n ≥ max{ 1 , σ 2 /δε^2 } we have
Pr
| ¯ X (^) n − μ| < ε
1 − δ
which completes the proof.
This concludes the proof. If convergence to the standard normal distribution is assumed to be ‘good’ (much better than δ), then we can take bound N such that
8
( (^) ε σ
δ 2
Compare this to the bound N ≥ σ 2 /δε^2 on account of Chebyshev’s Inequality. As an example, consider the case where we want to be 95% certain that the sam- ple mean falls within 14 σ of μ; that is, δ = 0 .05 and ε = σ/4. Chebyshev’s Inequality yields N ≥ 16 / 0. 05 = 320 and the standard normal approximation yields
N / 4 ≥ 1 .96 or N ≥ 61 .47. Thus, if the standard normal approximation is ‘good’ then our need is already fulfilled by the mean of 62 samples, instead of the 320 required by Chebyshev’s Inequality.
I would like to emphasize the following points concerning the Central Limit The- orem.
3 The Strong Law of Large Numbers
Let’s start again with the theorem.
3.1 Theorem ( Strong Law of Large Numbers ) We have
Pr
lim n →∞ X ¯ (^) n = μ
This is often abbreviated to
X^ ¯ (^) n →^ a.s. μ as n → ∞
or in words: X ¯ (^) n converges almost surely to μ as n → ∞.
One of the problems with such a law is the assignment of probabilities to state- ments involving infinitely many random variables. For that purpose, one needs a careful introduction of notions like sample space , probability measure , and random variable. See for instance [Tuc67, Hea71, Chu74a, LR79]. Using some Probability Theory, the Strong Law can be rewritten into a form with probabilities involving finitely many random variables only. We rewrite Equa- tion (7) in a chain of equivalences:
Pr
lim n →∞ X ¯ (^) n = μ
⇔ { definition of limit } Pr
∀ε> 0 ∃ N > 0 ∀ n ≥ N | ¯ X (^) n − μ| ≤ ε
⇔ { Note 1 below } ∀ε> 0 Pr
∃ N > 0 ∀ n ≥ N | ¯ X (^) n − μ| ≤ ε
⇔ { Note 2 below } ∀ε> 0 ∀δ> 0 ∃ N > 0 Pr
∀ n ≥ N | ¯ X (^) n − μ| ≤ ε
≥ 1 − δ ( 10 ) ⇔ { Note 3 below } ∀ε> 0 ∀δ> 0 ∃ N > 0 ∀ r ≥ 0 Pr
∀ N ≤ n ≤ N + r | ¯ X (^) n − μ| ≤ ε
≥ 1 − δ ( 11 )
Comparing Equations (2) and (10) we immediately infer the Weak Law from the Strong Law, which explains their names. In order to supply the notes to above derivation, let (, F , P ) be an appropriate probability space for the random variables X (^) i , and define events A ε , B (^) N , and C (^) r for ε > 0, N > 0, and r ≥ 0 by
A ε = {ω ∈ | ∃ N > 0 ∀ n ≥ N | ¯ X (^) n (ω) − μ| ≤ ε} B (^) N = {ω ∈ | ∀ n ≥ N | ¯ X (^) n (ω) − μ| ≤ ε} C (^) r = {ω ∈ | ∀ N ≤ n ≤ N + r | ¯ X (^) n (ω) − μ| ≤ ε}.
These events satisfy the following monotonicity properties:
A ε ⊇ A ε′^ for ε ≥ ε′ B (^) N ⊆ B (^) N + 1 C (^) r ⊇ C (^) r + 1.
Therefore, on account of the continuity of probability measure P for monotonic chains of events, we have
P (
m = 1 A^1 / m^ )^ =^ m lim→∞ P ( A^1 / m^ )^ (12) P (
N = 1 B^ N^ )^ =^ N lim→∞ P ( B^ N^ )^ (13) P (
r = 0 C^ r^ )^ =^ r lim→∞ P ( C^ r^ ).^ (14)
Note 1. We derive
Pr
∀ε> 0 ∃ N > 0 ∀ n ≥ N | ¯ X (^) n − μ| ≤ ε
⇔ { definitions of Pr and A ε } P (
ε> 0 A ε^ )^ =^1 ⇔ { monotonicity of A ε, using 1/ m → 0 as m → ∞ } P (
m = 1 A^1 / m^ )^ =^1 ⇔ { (12) }
“In particular, [the Strong Law] shows that, with probability 1, for any positive value ε, ∣ ∣∣ ∣ ∣
∑^ n
i = 1
X (^) i n
− μ
will be greater than ε only a finite number of times.”
Equation (10) can be recognized in [Hea71, p. 226]:
“Indeed for arbitrarily small ε > 0, δ > 0, and large N = N (ε, δ),... the [definition] of X (^) n^ a →.s. X... can be restated... as
n = N
{ω | | X (^) n (ω) − X (ω)| < ε}
1 − δ
Equation (11) resembles the definition in [Fel68, p. 259]:
“We say that the sequence X (^) k obeys the strong law of large numbers if to every pair > 0, δ > 0, there corresponds an N such that there is probability 1 − δ or better that for every r > 0 all r + 1 inequalities
| S n − m (^) n | n
< , n = N , N + 1 ,... , N + r
will be satisfied.”
Again the case with finite variance is easier than the general case. Most of the proofs that I have encountered for the Strong Law assuming finite variance are based on Kolmogorov’s Inequality, which is a generalization of Chebyshev’s In- equality. Even in that case there are still some technical hurdles (I will not go into these). Consequently, the proofs do not give rise to an explicit bound N in (11) in terms of ε and δ. An exception is [Eis69], where the Hajek–R´enyi Inequality is used, which is a generalization of Kolmogorov’s Inequality. There is, however, a nice overview article, namely [HR80], that specifically looks at bounds N in terms of ε and δ. It shows, among other things, that
Pr
∃ n ≥ N | ¯ X (^) n − μ| ≥ ε
σ 2 N ε^2
Compare this result to (3): it is the same upper bound but for a much larger event. This creates the impression that the Weak Law is not that much weaker than the Strong Law.
4 Concluding Remarks
We have looked at one special case to clarify the relationship between the Weak and the Strong Law of Large Numbers. The case was special in that we have assumed X (^) i to be a sequence of independent and identically distributed random variables with finite expectation and that we have considered the convergence of partial sample means to the common expectation. Historically, it was preceded by more special cases, for instance, with the X (^) i restricted to Bernoulli variables. Nowadays these laws are treated in much more generality. My main interest was focused on Equations (8) through (11), which are equivalent— but subtly different—formulations of the Strong Law of Large Numbers. Further- more, I have looked at “constructive” bounds related to the rate of convergence. Let me conclude with some sobering quotes (also to be found in [Chu74b, p. 233]). Feller writes in [Fel68, p. 152]:
“[The weak law of large numbers] is of very limited interest and should be replaced by the more precise and more useful strong law of large numbers.”
In [Wae71, p. 98], van der Waerden writes:
“[The strong law of large numbers] scarcely plays a role in mathemat- ical statistics.”
Acknowledgment I would like to thank Fred Steutel for discussing the issues in this paper and for pointing out reference [HR80].
References
[Chu74a] Kai Lai Chung. A Course in Probability Theory. Academic Press, sec- ond edition, 1974.
[Chu74b] Kai Lai Chung. Elementary Probability Theory and Stochastic Processes. Springer, 1974.
[Eis69] Marin Eisen. Introduction to Mathematical Probability Theory. Prentice-Hall, 1969.
[Fel68] William Feller. An Introduction to Probability Theory and Its Applica- tions , volume I. Wiley, third edition, 1968.
[Fel71] William Feller. An Introduction to Probability Theory and Its Applica- tions , volume II. Wiley, second edition, 1971.
[Hea71] C. R. Heathcote. Probability: Elements of the Mathematical Theory. Unwin, 1971.