Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bayesian Optimal Predictive Model Selection: Median Probability Model and Prevalence Model, Study notes of Statistics

Bayesian optimal predictive model selection, focusing on the median probability model and prevalence model. The author, ernest fokoué, from the ohio state university, explains the concept of model space and optimality criterion, bayesian predictive optimality, and sparse bayesian learning. The document also covers optimal prediction via model space search and provides examples and applications.

Typology: Study notes

Pre 2010

Uploaded on 07/23/2009

koofers-user-ry7
koofers-user-ry7 🇺🇸

10 documents

1 / 54

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Outline
Bayesian Optimal Predictive Model Selection
Ernest Fokoué1
1Department of Statistics
THE OHIO STATE UNIVERSITY
Kettering University
August, 2006
ERNEST PARFAIT FOKOUÉ Bayesian Optimal Predictive Model Selection
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36

Partial preview of the text

Download Bayesian Optimal Predictive Model Selection: Median Probability Model and Prevalence Model and more Study notes Statistics in PDF only on Docsity!

Outline

Bayesian Optimal Predictive Model Selection

Ernest Fokoué

1

1 Department of Statistics

THE OHIO STATE UNIVERSITY

Kettering University

August, 2006

Outline

OUTLINE

(^1) Introduction to Predictive Model Selection

Model Space and Optimality Criterion

Bayesian Predictive Optimality

Sparse Bayesian Learning

(^2) Optimal Prediction via Model Space Search

The Median Probability Model

The Prevalence Model

(^3) Examples, Conclusion and Extensions

Examples and applications

Conclusion and Extensions

Outline

OUTLINE

(^1) Introduction to Predictive Model Selection

Model Space and Optimality Criterion

Bayesian Predictive Optimality

Sparse Bayesian Learning

(^2) Optimal Prediction via Model Space Search

The Median Probability Model

The Prevalence Model

(^3) Examples, Conclusion and Extensions

Examples and applications

Conclusion and Extensions

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

OUTLINE

(^1) Introduction to Predictive Model Selection

Model Space and Optimality Criterion

Bayesian Predictive Optimality

Sparse Bayesian Learning

(^2) Optimal Prediction via Model Space Search

The Median Probability Model

The Prevalence Model

(^3) Examples, Conclusion and Extensions

Examples and applications

Conclusion and Extensions

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

DIFFERENT TYPES OF ATOMS AND EXPANSIONS

Traditional linear model with x = (x 1 , · · · , xp)

T ∈ R

p

f ( x ) = β 0

p ∑

j= 1

β j x j

Polynomial regression with x ∈ R

f ( x ) = β 0

p ∑

j= 1

β j x

j

Kernel regression with x ∈ R

p

f ( x ) = β 0

n ∑

j= 1

β j K ( x , x j

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

DATA MATRIX AND FULL MODEL

The full model can be written as

y = H β + 

Where the data matrix H is defined by

H =

1 h 1 ( x 1 ) h 2 ( x 1 ) · · · hp( x 1

1 h 1 ( x 2 ) h 2 ( x 2 ) · · · h p ( x 2

1 h 1 ( x n ) h 2 ( x n ) · · · h p ( x n

And the other elements are

y = (y 1 , · · · , y n

T , β = (β 0 , · · · , β p

T ,  = ( 1

n

T

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

OPTIMAL PREDICTIVE MODEL SELECTION

Optimal predictive selection seeks to select from M

M

v = arg min

Mv ∈M

R(Mv )

where the risk function R(M v ) is

R(M

v ) = E[`(y

new , ˆy

new

v

)]

The loss function is the squared error loss

`(y

new , ˆy

new

v ) = (y

new − yˆ

new

v

2

The estimated prediction is given by

ˆy

new

v

Kv ∑

j= 1

β

(j)

v h

(j)

v ( x

new ) + ˆβ 0

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

OUTLINE

(^1) Introduction to Predictive Model Selection

Model Space and Optimality Criterion

Bayesian Predictive Optimality

Sparse Bayesian Learning

(^2) Optimal Prediction via Model Space Search

The Median Probability Model

The Prevalence Model

(^3) Examples, Conclusion and Extensions

Examples and applications

Conclusion and Extensions

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

A MISLEADING INTUITION FOR MODEL SELECTION

Highest probability model

The intuition might suggest that the best predictive model is the

model with the highest posterior probability

M

v

∗ (^) = arg max

M v ∈M

p (M v | y )

Some drawbacks of highest probability model

Correct if there are only 2 models in M.

Requires considering all the models in M.

Not necessarily the best when |M| ≥ 2.

See Babieri and Berger (2004) for details.

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

PREDICTION THROUGH BAYESIAN MODEL AVERAGING

Optimality of BMA prediction

It is a known result that given a list of models, the Bayes Model

Average (BMA) prediction

ˆy new =

2

p − 1 ∑

k= 1

p (M v k

| y )E[Y |M v k

, x new]

is optimal.

But ...

Model description is lost in the averages.

Computationally prohibitive.

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

OUTLINE

(^1) Introduction to Predictive Model Selection

Model Space and Optimality Criterion

Bayesian Predictive Optimality

Sparse Bayesian Learning

(^2) Optimal Prediction via Model Space Search

The Median Probability Model

The Prevalence Model

(^3) Examples, Conclusion and Extensions

Examples and applications

Conclusion and Extensions

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

GENERAL APPROACH TO SPARSITY

What’s the intuition in sparsity?

Likelihood under Gaussian noise with isotropic variance σ

2

p( y |β, σ

2 ) = ( 2 πσ

2 )

n

(^2) exp

2 σ

2

yH β‖

2

Sparsity intuitively means

Constrain the space of β so that many β i are zero.

Sparsity naturally achieved by

`

1 norm on β or double exponential prior over β.

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

RELEVANCE VECTOR REGRESSION (TIPPING, 2000)

Automatic relevance determination (ARD) hyperprior for

sparsity induction

α i ∼ gamma(a, b)

Marginal prior on β i

p(β i

p(β i |α i )p(α i )dα i

Optimal Prediction via Model Space Search

Examples, Conclusion and Extensions

Bayesian Predictive Optimality

Sparse Bayesian Learning

SPARSITY OF THE RVM

How does a Gaussian prior end up sparse?

The marginal prior distribution of p(βi )

p(β i ) = Student−t(driven by a and b)

The marginal prior p(β) is therefore a product of Student-t,

and therefore a good device for sparsity.

Sparsity controlled through hyperparameters a and b.

Integrating β out leaves us with an α dependent

distribution, therefore a device for controlling sparsity

through ML 2.