














































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Bayesian optimal predictive model selection, focusing on the median probability model and prevalence model. The author, ernest fokoué, from the ohio state university, explains the concept of model space and optimality criterion, bayesian predictive optimality, and sparse bayesian learning. The document also covers optimal prediction via model space search and provides examples and applications.
Typology: Study notes
1 / 54
This page cannot be seen from the preview
Don't miss anything!
Outline
Ernest Fokoué
1
1 Department of Statistics
THE OHIO STATE UNIVERSITY
Kettering University
August, 2006
Outline
(^1) Introduction to Predictive Model Selection
Model Space and Optimality Criterion
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^2) Optimal Prediction via Model Space Search
The Median Probability Model
The Prevalence Model
(^3) Examples, Conclusion and Extensions
Examples and applications
Conclusion and Extensions
Outline
(^1) Introduction to Predictive Model Selection
Model Space and Optimality Criterion
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^2) Optimal Prediction via Model Space Search
The Median Probability Model
The Prevalence Model
(^3) Examples, Conclusion and Extensions
Examples and applications
Conclusion and Extensions
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^1) Introduction to Predictive Model Selection
Model Space and Optimality Criterion
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^2) Optimal Prediction via Model Space Search
The Median Probability Model
The Prevalence Model
(^3) Examples, Conclusion and Extensions
Examples and applications
Conclusion and Extensions
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
Traditional linear model with x = (x 1 , · · · , xp)
T ∈ R
p
f ( x ) = β 0
p ∑
j= 1
β j x j
Polynomial regression with x ∈ R
f ( x ) = β 0
p ∑
j= 1
β j x
j
Kernel regression with x ∈ R
p
f ( x ) = β 0
n ∑
j= 1
β j K ( x , x j
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
The full model can be written as
y = H β +
Where the data matrix H is defined by
1 h 1 ( x 1 ) h 2 ( x 1 ) · · · hp( x 1
1 h 1 ( x 2 ) h 2 ( x 2 ) · · · h p ( x 2
1 h 1 ( x n ) h 2 ( x n ) · · · h p ( x n
And the other elements are
y = (y 1 , · · · , y n
T , β = (β 0 , · · · , β p
T , = ( 1
n
T
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
Optimal predictive selection seeks to select from M
∗
v = arg min
Mv ∈M
R(Mv )
where the risk function R(M v ) is
v ) = E[`(y
new , ˆy
new
v
The loss function is the squared error loss
`(y
new , ˆy
new
v ) = (y
new − yˆ
new
v
2
The estimated prediction is given by
ˆy
new
v
Kv ∑
j= 1
β
(j)
v h
(j)
v ( x
new ) + ˆβ 0
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^1) Introduction to Predictive Model Selection
Model Space and Optimality Criterion
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^2) Optimal Prediction via Model Space Search
The Median Probability Model
The Prevalence Model
(^3) Examples, Conclusion and Extensions
Examples and applications
Conclusion and Extensions
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
Highest probability model
The intuition might suggest that the best predictive model is the
model with the highest posterior probability
v
∗ (^) = arg max
M v ∈M
p (M v | y )
Some drawbacks of highest probability model
Correct if there are only 2 models in M.
Requires considering all the models in M.
Not necessarily the best when |M| ≥ 2.
See Babieri and Berger (2004) for details.
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
Optimality of BMA prediction
It is a known result that given a list of models, the Bayes Model
Average (BMA) prediction
ˆy new =
2
p − 1 ∑
k= 1
p (M v k
| y )E[Y |M v k
, x new]
is optimal.
But ...
Model description is lost in the averages.
Computationally prohibitive.
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^1) Introduction to Predictive Model Selection
Model Space and Optimality Criterion
Bayesian Predictive Optimality
Sparse Bayesian Learning
(^2) Optimal Prediction via Model Space Search
The Median Probability Model
The Prevalence Model
(^3) Examples, Conclusion and Extensions
Examples and applications
Conclusion and Extensions
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
What’s the intuition in sparsity?
Likelihood under Gaussian noise with isotropic variance σ
2
p( y |β, σ
2 ) = ( 2 πσ
2 )
−
n
(^2) exp
2 σ
2
‖ y − H β‖
2
Sparsity intuitively means
Constrain the space of β so that many β i are zero.
Sparsity naturally achieved by
1 norm on β or double exponential prior over β.
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
Automatic relevance determination (ARD) hyperprior for
sparsity induction
α i ∼ gamma(a, b)
Marginal prior on β i
p(β i
p(β i |α i )p(α i )dα i
Optimal Prediction via Model Space Search
Examples, Conclusion and Extensions
Bayesian Predictive Optimality
Sparse Bayesian Learning
How does a Gaussian prior end up sparse?
The marginal prior distribution of p(βi )
p(β i ) = Student−t(driven by a and b)
The marginal prior p(β) is therefore a product of Student-t,
and therefore a good device for sparsity.
Sparsity controlled through hyperparameters a and b.
Integrating β out leaves us with an α dependent
distribution, therefore a device for controlling sparsity
through ML 2.