Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Digital Signal Processing Handbook, Study notes of Digital Signal Processing

Djuric, P.M.

Typology: Study notes

2011/2012

Uploaded on 03/13/2012

andrea_cos89
andrea_cos89 🇮🇹

1 document

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Djuric, P.M. & Kay S.M. Spectrum Estimation and Modeling
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c1999byCRCPressLLC
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Digital Signal Processing Handbook and more Study notes Digital Signal Processing in PDF only on Docsity!

Djuric, P.M. & Kay S.M. “Spectrum Estimation and Modeling”

Digital Signal Processing Handbook

Ed. Vijay K. Madisetti and Douglas B. Williams

Boca Raton: CRC Press LLC, 1999

Spectrum Estimation and Modeling

Petar M. Djuri ´c State University of New York at Stony Brook

Steven M. Kay University of Rhode Island

14.1 Introduction 14.2 Important Notions and Definitions Random Processes •^ Spectra of Deterministic Signals •^ Spectra of Random Processes 14.3 The Problem of Power Spectrum Estimation 14.4 Nonparametric Spectrum Estimation Periodogram •^ The Bartlett Method •^ The Welch Method • Blackman-Tukey Method •^ Minimum Variance Spectrum Es- timator •^ Multiwindow Spectrum Estimator 14.5 Parametric Spectrum Estimation Spectrum Estimation Based on Autoregressive Models •^ Spec- trum Estimation Based on Moving Average Models •^ Spectrum Estimation Based on Autoregressive Moving Average Models • Pisarenko Harmonic Decomposition Method •^ Multiple Sig- nal Classification (MUSIC) 14.6 Recent Developments References

14.1 Introduction

The main objective of spectrum estimation is the determination of the power spectrum density (PSD) of a random process. The PSD is a function that plays a fundamental role in the analysis of stationary random processes in that it quantifies the distribution of total power as a function of frequency. The estimation of the PSD is based on a set of observed data samples from the process. A necessary assumption is that the random process is at least wide sense stationary, that is, its first and second order statistics do not change with time. The estimated PSD provides information about the structure of the random process which can then be used for refined modeling, prediction, or filtering of the observed process. Spectrum estimation has a long history with beginnings in ancient times [17]. The first significant discoveries that laid the grounds for later developments, however, were made in the early years of the eighteenth century. They include one of the most important advances in the history of mathematics, Fourier’s theory. According to this theory, an arbitrary function can be represented by an infinite summation of sine and cosine functions. Later came the Sturm-Liouville spectral theory of differential equations, which was followed by the spectral representations in quantum and classical physics developed by John von Neuman and Norbert Wiener, respectively. The statistical theory of spectrum estimation started practically in 1949 when Tukey introduced a numerical method for computation of spectra from empirical data. A very important milestone for further development of the field was the reinvention of the fast Fourier transform (FFT) in 1965, which is an efficient algorithm for computation of the discrete Fourier transform. Shortly thereafter came the work of John Burg, who

where the amplitude a and the frequency f 0 are constants, and the phase θ˜ is a random variable that is uniformly distributed over the interval (−π, π), one can show that

E( x˜[n]) = 0 (14.4)

and

r[n, n + k] = E

x˜∗^ [n] ˜x[n + k]

a^2 2

cos( 2 πf 0 k). (14.5)

Thus, Eq. (14.3) represents a wide-sense stationary random process.

14.2.2 Spectra of Deterministic Signals

Before we define the concept of spectrum of a random process, it will be useful to review the analogous concept for deterministic signals, which are signals whose future values can be exactly determined without any uncertainty. Besides their description in the time domain, the deterministic signals have a very useful representation in terms of superposition of sinusoids with various frequencies, which is given by the discrete-time Fourier transform (DTFT). If the observed signal is {g[n]} and it is not periodic, its DTFT is the complex valued function G(f ) defined by

G(f ) =

n=−∞

g[n]e−j^2 πf n^ (14.6)

where j =

− 1 , f is the normalized frequency, 0 ≤ f < 1 , and e j^2 πf n^ is the complex exponential given by e j^2 πf n^ = cos( 2 πf n) + j sin( 2 πf n). (14.7)

The sum in Eq. (14.6) converges uniformly to a continuous function of the frequency f if

∑^ ∞

n=−∞

|g[n]| < ∞. (14.8)

The signal {g[n]} can be determined from G(f ) by the inverse DTFT defined by

g[n] =

0

G(f )e j^2 πf n^ df (14.9)

which means that the signal {g[n]} can be represented in terms of complex exponentials whose frequencies span the continuous interval [0,1). The complex function G(f ) can be alternatively expressed as

G(f ) = |G(f )|e j φ(f )^ (14.10)

where |G(f )| is called the amplitude spectrum of {g[n]}, and φ(f ) the phase spectrum of {g[n]}. For example, if the signal {g[n]} is given by

g[n] =

1 , n = 1 0 , n 6 = 1 (14.11)

then G(f ) = e−j^2 πf^ (14.12)

and the amplitude and phase spectra are

|G(f )| = 1 , 0 ≤ f < 1 φ(f ) = − 2 πf, 0 ≤ f < 1. (14.13)

The total energy of the signal is given by

E =

n=−∞

|g[n]|^2 (14.14)

and according to Parseval’s theorem, it can also be obtained from the amplitude spectrum of the signal, i.e., ∑∞

n=−∞

|g[n]|^2 =

0

|G(f )|^2 df. (14.15)

From Eq. (14.15), we deduce that |G(f )|^2 df is the contribution to the total energy of the signal from the frequency band (f, f + df ). Therefore, we say that |G(f )|^2 represents the energy density spectrum of the signal {g[n]}. When {g[n]} is periodic with period N, that is

g(n) = g(n + N) (14.16)

for all n, and where N is the period of {g[n]}, we use the discrete Fourier transform (DFT) to express {g[n]} in the frequency domain, that is,

G(fk ) =

N∑− 1

n= 0

g[n]e−j^2 πfk^ n^ , fk =

k N

, k ∈ { 0 , 1 , · · · , N − 1 }. (14.17)

Note that the frequency here takes values from a discrete set. The inverse DFT is defined by

g[n] =

N

N∑− 1

k= 0

G(fk )e j^2 πfk^ n^ , fk =

k N

Now Parseval’s relation becomes

N∑− 1

n= 0

|g[n]|^2 =

N

N∑− 1

k= 0

|G(fk )|^2 , fk =

k N

(14.19)

where the two sides are the total energy of the signal in one period. If we define the average power of the discrete-time signal by

P =

N

N∑− 1

n= 0

|g[n]|^2 (14.20)

then from Eq. (14.19)

P =

N^2

N∑− 1

k= 0

|G(f (^) k )|^2 , fk =

k N

Thus, |G(fk )|^2 /N^2 is the contribution to the total power from the term with frequency fk , and so it represents the power spectrum “density” of {g[n]}. For example, if the periodic signal in one period is defined by

g[n] =

1 , n = 0 0 , n = 1 , 2 , · · · , N − 1 (14.22)

random process has a line spectrum. In this case, a useful representation of the spectrum is given by the Dirac δ-functions,

P (f ) =

k

P (^) k δ(f − fk ) (14.29)

where Pk is the power associated with the k line component. Finally, the spectrum of a random process may be mixed if it is a combination of a continuous and line spectra. Then P (f ) is a superposition of a continuous function of f and δ-functions.

14.3 The Problem of Power Spectrum Estimation

The problem of power spectrum estimation can be stated as follows: Given a set of N samples {x[ 0 ], x[ 1 ], ..., x[N − 1 ]} of a realization of the random process { ˜x[n]}, denoted also by {x[n]}N 0 −^1 , estimate the PSD of the random process, P (f ). Obviously this task amounts to estimation of a function and is distinct from the typical problem in elementary statistics where the goal is to estimate a finite set of parameters. Spectrum estimation methods can be classified into two categories: nonparametric and parametric. The nonparametric approaches do not assume any specific parametric model for the PSD. They are based solely on the estimate of the autocorrelation sequence of the random process from the observed data. For the parametric approaches on the other hand, we first postulate a model for the process of interest, where the model is described by a small number of parameters. Based on the model, the PSD of the process can be expressed in terms of the model parameters. Then the PSD estimate is obtained by substituting the estimated parameters of the model in the expression for the PSD. For example, if a random process { ˜x[n]} can be modeled by

x ˜[n] = −a x˜[n] + ˜w[n] (14.30)

where a is an unknown parameter and { ˜w[n]} is a zero mean wide-sense stationary random process whose random variables are uncorrelated and with the same variance σ 2 , it can be shown that the PSD of { ˜x[n]} is

P (f ) =

σ 2 | 1 + ae−j^2 πf^ |^2

Thus, to find P (f ) it is sufficient to estimate a and σ 2. The performance of a PSD estimator is evaluated by several measures of goodness. One is the bias of the estimator defined by

b(f ) = E

P (f )ˆ − P (f )

(14.32)

where P (f )ˆ and P (f ) are the estimated and true PSD, respectively. If the bias b(f ) is identically equal to zero for all f , the estimator is said to be unbiased, which means that on average it yields the true PSD. Among the unbiased estimators, we search for the one that has minimal variability. The variability is measured by the variance of the estimator

v(f ) = E

[ ˆP (f ) − E( P (f ))ˆ ]^2

A measure that combines the bias and the variance is the relative mean square error given by [15]

ν(f ) =

v(f ) + b(f )^2 P (f )

The variability of a PSD estimator is also measured by the normalized variance [8]

ψ(f ) =

v(f ) E^2 ( P (f )) ˆ

Finally, another important metric for comparison is the resolution of the PSD estimators. It corresponds to the ability of the estimator to provide the fine details of the PSD of the random process. For example if the PSD of the random process has two peaks at frequencies f 1 and f 2 , then the resolution of the estimator would be measured by the minimum separation of f 1 and f 2 for which the estimator still reproduces two peaks at f 1 and f 2.

14.4 Nonparametric Spectrum Estimation

When the method for PSD estimation is not based on any assumptions about the generation of the observed samples other than wide-sense stationarity, then it is termed a nonparametric estimator. According to Eq. (14.28), P (f ) can be obtained by first estimating the autocorrelation sequence from the observed samples x[ 0 ], x[ 1 ], · · ·, x[N − 1 ], and then applying the DTFT to these estimates. One estimator of the autocorrelation is given by

r ˆ[k] =

N

N−∑ 1 −k

n= 0

x∗^ [n]x[n + k], 0 ≤ k ≤ N − 1. (14.36)

The estimates of rˆ[k] for −N < k < 0 are obtained from the identity

r ˆ[−k] = ˆr∗^ [k] (14.37)

and those for |k| ≥ N are set equal to zero. This estimator, although biased, has been preferred over others. An important reason for favoring it is that it always yields nonnegative estimates of the PSD, which is not the case with the unbiased estimator. Many nonparametric estimators rely on using Eq. (14.36) and then transform the obtained autocorrelation sequence to estimate the PSD. Other nonparametric methods, however, operate directly on the observed data.

14.4.1 Periodogram

The periodogram was introduced by Schuster in 1898 when he was searching for hidden periodicities while studying sunspot data [19]. To find the periodogram of the data {x[n]}N 0 −^1 , first we determine the autocorrelation sequence r[k] for −(N − 1 ) ≤ k ≤ N − 1 and then take the DTFT, i.e.,

P^ ˆPER(f ) =

N∑− 1

k=−N+ 1

r ˆ[k]e−j^2 πf k^. (14.38)

It is more convenient to write the periodogram directly in terms of the observed samples x[n]. It is then defined as

P^ ˆPER(f ) = 1 N

N∑− 1

n= 0

x[n]e−j^2 πf n

2

. (14.39)

Thus, the periodogram is proportional to the squared magnitude of the DTFT of the observed data. In practice, the periodogram is calculated by applying the FFT, which computes it at a discrete set of

where WR (f ) is the DTFT of the rectangular window. Hence, the mean value of the periodogram is a smeared version of the true PSD. Since the implementation of the periodogram as defined in Eq. (14.44) implies the use of a rectangular window, a question arises as to whether we could use a window of different shape to reduce the variance of the periodogram. The answer is yes, and indeed many windows have been proposed which weight the data samples in the middle of the observed data more than those towards the ends of the observed data. Some frequently used alternatives to the rectangular window are the windows of Bartlett, Hanning, Hamming, and Blackman. The magnitude of the DTFT of a window provides two important characteristics about it. One is the width of the window’s mainlobe and the other is the strength of its sidelobes. A narrow mainlobe allows for a better resolution, and low sidelobes improve the smoothing of the estimated spectrum. Unfortunately, the narrower its mainlobe, the higher the sidelobes, which is a typical trade-off in spectrum estimation. It turns out that the rectangular window allows for the best resolution but has the largest sidelobes.

14.4.2 The Bartlett Method

One approach to reduce the variance of the periodogram is to subdivide the observed data record into K nonoverlapping segments, find the periodogram of each segment, and finally evaluate the average of the so-obtained periodograms. This spectrum estimator, also known as the Bartlett’s estimator, has variance that is smaller than the variance of the periodogram. Suppose that the number of data samples N is equal to KL, where K is the number of segments and L is their length. If the i-th segment is denoted by {xi [n]}L 0 −^1 , i = 1 , 2 · · · , K, where

x (^) i [n] = x[n + (i − 1 )L], n ∈ { 0 , 1 , · · · , L − 1 } (14.47)

and its periodogram by

P^ ˆ (^) PER(i)(f ) = 1 L

L∑− 1

n= 0

xi [n]e−j^2 πf n

2 (14.48)

then the Bartlett spectrum estimator is

P^ ˆB(f ) = 1 K

∑K

i= 1

P^ ˆ (^) PER(i)(f ). (14.49)

This estimator is consistent and its variance compared to the variance of the periodogram is reduced by a factor of K. This reduction, however, is paid by a decrease in resolution. The Bartlett estimator has a resolution K times less than that of the periodogram. Thus, this estimator allows for a straightforward trading of resolution for variance.

14.4.3 The Welch Method

The Welch method is another estimator that exploits the periodogram. It is based on the same idea as Bartlett’s approach of splitting the data into segments and finding the average of their periodograms. The difference is that the segments are overlapped, where the overlaps are usually 50% or 75% large, and the data within a segment are windowed. Let the length of the segments be L, the i-th segment be denoted again by {xi [n]}L 0 −^1 , and the offset of successive sequences by D samples. Then

N = L + D(K − 1 ) (14.50)

where N is the total number of observed samples and K the total number of sequences. Note that if there is no overlap, K = N/L, and if there is 50% overlap, K = 2 N/L − 1. The i-th sequence is

defined by xi [n] = x[n + (i − 1 )D], n ∈ { 0 , 1 , · · · , L − 1 } (14.51)

where i = 1 , 2 , · · · , K, and its periodogram by

P^ ˆ (i) M (f )^ =^

L

L∑− 1

n= 0

w[n]xi [n]e−j^2 πf n

2

. (14.52)

Here Pˆ (^) M(i) (f ) is the modified periodogram of the data because the samples x[n] are weighted by a nonrectangular window w[n]. The Welch spectrum estimate is then given by

P^ ˆB(f ) = 1 K

∑K

i= 1

P^ ˆ (i) M (f ).^ (14.53)

By permitting overlap of sequences, we can form more segments than in the case of Bartlett’s method. Also, if we keep the same number of segments, the overlap allows for longer segments. The increased number of segments reduces the variance of the estimator, and the longer segments improve its resolution. Thus, with the Welch method we can trade reduction in variance for improvement in resolution in many more ways than with the Bartlett method. It can be shown that if the overlap is 50%, the variance of the Welch estimator is approximately 9/16 of the variance of the Bartlett estimator [8].

14.4.4 Blackman-Tukey Method

The periodogram can be expressed in terms of the estimated autocorrelation lags as

P^ ˆPER(f ) =

N∑− 1

k=−(N− 1 )

r ˆ[k]e−j^2 πf k^. (14.54)

where

ˆr[k] =

N

∑N− 1 −k n= 0 x ∗ (^) [n]x[n + k], k = 0 , 1 , · · · , N − 1 r ˆ∗^ [−k], k = −(N − 1 ), −(N − 2 ), · · · , − 1

From Eqs. (14.54) and (14.55) we see that the estimated autocorrelation lags are given the same weight in the periodogram regardless of the difference of their variances. From Eq. (14.55), however, it is obvious that the autocorrelations with smaller lags will be estimated more accurately than the ones with lags close to N because of the different number of terms that are used in the summation. For example, rˆ[N − 1 ] has only the term x∗^ [ 0 ]x[n− 1 ] compared to the N terms used in the computation of rˆ[ 0 ]. Therefore, the large variance of the periodogram can be ascribed to the large weight given to the poor autocorrelation estimates used in its evaluation. Blackman and Tukey proposed to weight the autocorrelation sequence so that the autocorrelations with higher lags are weighted less [3]. Their estimator is given by

P^ ˆBT(f ) =

N∑− 1

k=−(N− 1 )

w[k]ˆr[k]e−j^2 πf k^ (14.56)

where the window w[k] is real nonnegative, symmetric, and nonincreasing with |k|, that is

  1. 0 ≤ w[k] ≤ w[ 0 ] = 1 ,
  2. w[−k] = w[k], and (14.57)
  3. w[k] = 0 , M < |k|, M ≤ N − 1.

dependent and optimized to minimize their response to components outside the band of interest. If the impulse response of the filter centered at f 0 is h (f 0 ), then it is desired to minimize

ρ =

0

|H (f )|^2 P (f )df (14.65)

subject to the constraint H (f 0 ) = 1 (14.66)

where H (f ) is the DTFT of h (f 0 ). This is a constrained minimization problem, and the solution provides the optimal impulse response. When the solutions are used to determine the PSD of the observed data, we obtain the minimum variance (MV) spectrum estimator

P^ ˆMV(f ) = N e H^ (f ) R ˆ−^1 e (f )

(14.67)

where R ˆ−^1 is the inverse matrix of the N × N autocorrelation matrix R ˆ defined by

R^ ˆ =

r ˆ[ 0 ] ˆr[− 1 ] rˆ[− 2 ] · · · ˆr[−N + 1 ] rˆ[ 1 ] rˆ[ 0 ] ˆr[− 1 ] · · · ˆr[−N + 2 ] .. .

r ˆ[N − 1 ] rˆ[N − 2 ] rˆ[N − 3 ] · · · rˆ[ 0 ]

The length of the FIR filter does not have to be N, especially if we want to avoid the use of the unreliable estimates of r[k]. If the length of the filter’s response is p < N, then the vector e (f ), the autocorrelation matrix R ˆ, and the spectrum estimate PˆMV(f ) are defined by Eqs. (14.64), (14.68), and (14.67), respectively, with N replaced by p [12]. The MV estimator has better resolution than the periodogram and the Blackman-Tukey estimator. The resolution and the variance of the MV estimator depend on the choice of the filter length p. If p is large, the bandwidth of the filter is small, which allows for better resolution. A larger p, however, requires more autocorrelation lags in the autocorrelation matrix R ˆ, which increases the variance of the estimated spectrum. Again, we have a trade-off between resolution and variance.

14.4.6 Multiwindow Spectrum Estimator

Many efforts have been made to improve the performance of the periodogram by multiplying the data with a nonrectangular window. The introduction of such windows has been more or less ad hoc, although they have been constructed to have narrow mainlobes and low sidelobes. By contrast, Thomson has proposed a spectrum estimation method that also involves use of windows but is derived from fundamental principles. The method is based on the approximate solution of a Fredholm equation using an eigenexpansion [21]. The method amounts to applying multiple windows to the data, where the windows are discrete prolate spheroidal (Slepian) sequences. These sequences are orthogonal and their Fourier transforms have the maximum energy concentration in a given bandwidth W. The multiwindow (MW) spectrum estimator is given by [21]

P^ ˆMW(f ) = 1 m

m∑− 1

i= 0

P^ ˆi (f ) (14.69)

where the Pˆi (f ) is the i-th eigenspectrum defined by

P^ ˆi (f ) = 1 λ (^) i

N∑− 1

n= 0

x[n]wi [n]e−j^2 πf n

2 (14.70)

with wi [n] being the i-th Slepian sequence, λi the i-th Slepian eigenvalue, and W the analysis bandwidth. The steps for obtaining PˆMW(f ) are [22]

  1. Selection of the analysis bandwidth W whose typical values are between 1. 5 /N and 20 /N. The number of windows m depends on the selected W , and is given by b 2 NW c, where b xc denotes the largest integer less than or equal to x. The spectrum estimator has a resolution equal to W.
  2. Evaluation of the m eigenspectra according to Eq. (14.70), where the Slepian sequences and eigenvalues satisfy

Cw i = λi w i. (14.71)

with the elements of the matrix C being given by

cmn =

sin( 2 πW (m − n)) π(m − n)

, m, n = 1 , 2 , · · · , N. (14.72)

In the evaluation of the eigenspectra, only the Slepian sequences that correspond to the m largest eigenvalues of C are used.

  1. Computation of the average spectrum according to Eq. (14.69). If the spectrum is mixed, that is, the observed data contain harmonics, the MW method uses a likelihood ratio test to determine if harmonics are present. If the test shows that there is a harmonic around the frequency f 0 , the spectrum is reshaped by adding an impulse at f 0 followed by correction of the “local” spectrum for the inclusion of the impulse. For details, see [9, 21].

The MW method is consistent, and its variance for fixed W tends to zero as 1 /N when N → ∞. The variance, however, as well as the bias and the resolution, depend on the bandwidth W.

14.5 Parametric Spectrum Estimation

A philosophically different approach to spectrum estimation of a random process is the parametric one, which is based on the assumption that the process can be described by a parametric model. Based on the model, the spectrum of the process can then be expressed in terms of the parameters of the model. The approach thus consists of three steps: (1) selection of an appropriate parametric model (usually based on a priori knowledge about the process), (2) estimation of the model parameters, and (3) computation of the spectrum using the so-obtained parameters. In the literature the parametric spectrum estimation methods are known as high-resolution methods because they can achieve better resolution than the nonparametric methods. The most frequently used models in the literature are the autoregressive (AR), the moving average (MA), the autoregressive moving average (ARMA), and the sum of harmonics (complex sinusoids) embedded in noise. With the AR model we assume that the observed data have been generated by a system whose input-output difference equation is given by

x[n] = −

∑^ p

k= 1

ak x[n − k] + e[n] (14.73)

where x[n] is the observed output of the system, e[n] is the unobserved input of the system, and the a (^) k ’s are its coefficients. The input e[n] is a zero mean white noise process with unknown variance σ 2 , and p is the order of the system. This model is usually abbreviated as AR(p). The MA model is

and ˆ r = [ˆr[ 1 ] ˆr[ 2 ] · · · ˆr[p]]T^. (14.82)

The parameters a are estimated by ˆ a = − ˆ R −^1 r ˆ (14.83)

and the noise variance is found from

σ ˆ 2 = ˆr[ 0 ] +

∑^ p

k= 1

a (^) k rˆ∗^ [k]. (14.84)

The PSD estimate is obtained when a ˆ and σˆ 2 are substituted in Eq. (14.77). This approach for estimating the AR parameters is known in the literature as the autocorrelation method. Many other AR estimation procedures have been proposed including the maximum likelihood method, the covariance method, and the Burg method [12]. Burg’s work in the late sixties has a special place in the history of spectrum estimation because it kindled the interest in this field. Burg showed that the AR model provides an extrapolation of a known autocorrelation sequence r[k], |k| ≤ p, for |k| beyond p so that the spectrum corresponding to the extrapolated sequence is the flattest of all spectra consistent with the 2 p + 1 known autocorrelations [4]. An important issue in finding the AR PSD is the order of the assumed AR model. There exist several model order selection procedures, but the most widely used are the Information Criterion A (AIC) due to Akaike [2] and the Information Criterion B (BIC), also known as the Minimum Description Length (MDL) principle, of Rissanen [16] and Schwarz [20]. According to the AIC criterion, the best model is the one that minimizes the function AI C(k) over k defined by

AI C(k) = N log σˆ (^) k^2 + 2 k (14.85)

where k is the model order, and σˆ (^) k^2 is the estimated noise variance of that model. Similarly, the MDL criterion chooses the order which minimizes the function MDL(k) defined by

MDL(k) = N log σˆ (^) k^2 + k log N (14.86)

where N is the number of observed data samples. It is important to emphasize that the MDL rule can be derived if, as a criterion for model selection, we use the maximum a posteriori principle. It has been found that the AIC is an inconsistent criterion whereas the MDL rule is consistent. Consistency here means that the probability of choosing the correct model order tends to one as N → ∞. The AR-based spectrum estimation methods show very good performance if the processes are narrowband and have sharp peaks in their spectra. Also, many good results have been reported when they are applied to short data records.

14.5.2 Spectrum Estimation Based on Moving Average Models

The PSD of a moving average process is given by

PMA(f ) = σ 2 | 1 +

∑^ q

k= 1

bk e−j^2 πf k^ |^2. (14.87)

It is not difficult to show that the r[k]’s for |k| > q of an MA(q) process are identically equal to zero, and that Eq. (14.87) can be expressed also as

PMA(f ) =

∑^ q

k=−q

r[k]e−j^2 πf k^. (14.88)

Thus, to find PˆMA(f ) it would be sufficient to estimate the autocorrelations r[k] and use the found estimates in Eq. (14.88). Obviously, this estimate would be identical to PˆBT(f ) when the applied window is rectangular and of length 2 q + 1. A different approach is to find the estimates of the unknown MA coefficients and σ 2 and use them in Eq. (14.87). The equations of the MA coefficients are nonlinear, which makes their esti- mation difficult. Durbin has proposed an approximate procedure that is based on a high order AR approximation of the MA process. First the data are modeled by an AR model of order L, where L >> q. Its coefficients are estimated from Eq. (14.83) and σˆ 2 according to Eq. (14.84). Then the sequence 1 , aˆ 1 , aˆ 2 , · · ·, aˆL is fitted with an AR(q) model, whose parameters are also estimated using the autocorrelation method. The estimated coefficients bˆ 1 , bˆ 2 , · · ·, bˆq are subsequently substituted in Eq. (14.87) together with σˆ 2. Good results with MA models are obtained when the PSD of the process is characterized by broad peaks and sharp nulls. The MA models should not be used for processes with narrowband features.

14.5.3 Spectrum Estimation Based on Autoregressive Moving Average Models

The PSD of a process that is represented by the ARMA model is given by

PARMA(f ) = σ 2

∑q k= 1 b^ k^ e

−j 2 πf k (^) | 2

| 1 +

∑p k= 1 a^ k^ e −j 2 πf k (^) | 2.^ (14.89)

The ML estimates of the ARMA coefficients are difficult to obtain, so we usually resort to methods that yield suboptimal estimates. For example, we can first estimate the AR coefficients based on the equation,

    

r ˆ[q] rˆ[q − 1 ] · · · rˆ[q − p + 1 ] rˆ[q + 1 ] rˆ[q] · · · rˆ[q − p + 2 ] .. .

r ˆ[M − 1 ] rˆ[M − 2 ] · · · rˆ[M − p]

a 1 a 2 .. . ap

q+ 1  (^) q+ 2 .. .  (^) M

r ˆ[q + 1 ] rˆ[q + 2 ] .. . r ˆ[M]

(14.90) or Ra^ ˆ +  = −ˆ r (14.91)

where i is a term that models the errors in the Yule-Walker equations due to the estimation errors of the autocorrelation lags, and M ≥ p + q. From Eq. (14.91), we can find the least squares estimates of a by

a ˆ = −

R ˆH^ R ˆ

R ˆH^ ˆ r. (14.92)

This procedure is known as the least-squares modified Yule-Walker equation method. Once the AR coefficients are estimated, we can filter the observed data

y[n] = x[n] +

∑^ p

k= 1

a ˆ (^) k x[n − k] (14.93)

and obtain a sequence that is approximately modeled by an MA(q) model. From the data y[n] we can estimate the MA PSD by Eq. (14.88) and obtain the PSD estimate of the data x[n]

PˆARMA(f ) =

PˆMA(f ) | 1 +

∑p k= 1 aˆk^ e

−j 2 πf k (^) | 2 (14.94)

where R s = EPE H^ (14.103)

for E = [ e 1 e 2 · · · e m] (14.104)

and P a diagonal matrix

P = diag

|A 1 |^2 , |A 2 |^2 , · · · , |Am|^2

If the matrix R s is M × M, where M ≥ m, its rank will be equal to the number of complex sinusoids m. Another important representation of the autocorrelation matrix R is via its eigenvalues and eigenvectors, i.e.,

R =

∑^ m

i= 1

(λi + σ 2 ) v i v iH +

∑^ M

i=m+ 1

σ 2 v i v iH (14.106)

where the λi ’s, i = 1 , 2 , · · · , m, are the nonzero eigenvalues of R s. Let the eigenvalues of R be arranged in decreasing order so that λ 1 ≥ λ 2 ≥ · · · ≥ λM , and let v i be the eigenvector corresponding to λ (^) i. The space spanned by the eigenvectors v i , i = 1 , 2 , · · · , m, is called the signal subspace, and the space spanned by v i , i = m + 1 , m + 2 , · · · , M, the noise subspace. Since the set of eigenvectors are orthonormal, that is

v iH v l =

1 , i = l 0 , i 6 = l (14.107)

the two subspaces are orthogonal. In other words if s is in the signal subspace, and z is in the noise subspace, then s H^ z = 0. Now suppose that the matrix R is (m + 1 ) × (m + 1 ). Pisarenko observed that the noise variance corresponds to the smallest eigenvalue of R and that the frequencies of the complex sinusoids can be estimated by using the orthogonality of the signal and noise subspaces, that is,

e Hi v m+ 1 = 0 , i = 1 , 2 , · · · , m. (14.108)

We can estimate the fi ’s by forming the pseudospectrum

P^ ˆPHD(f ) = 1 | e H^ (f ) v m+ 1 |^2

(14.109)

which should theoretically be infinite at the frequencies fi. In practice, however, the pseudospectrum does not exhibit peaks exactly at these frequencies because R is not known and, instead, is estimated from finite data records. The PSD estimate in Eq. (14.109) does not include information about the power of the noise and the complex sinusoids. The powers, however, can easily be obtained by using Eq. (14.98). First note that P (^) e (f ) = σ 2 , and σˆ 2 = λm+ 1. Second, the frequencies fi are determined from the pseudospectrum Eq. (14.109), so it remains to find the powers of the complex sinusoids Pi = |Ai |^2. This can readily be accomplished by using the set of m linear equations    

e H 1 v 1 |^2 |ˆ e H 2 v 1 |^2 · · · |ˆ e Hm v 1 |^2 |ˆ e H 1 v 2 |^2 |ˆ e H 2 v 2 |^2 · · · |ˆ e Hm v 2 |^2 .. .

e H 1 v m|^2 |ˆ e H 2 v m|^2 · · · |ˆ e Hm v m|^2

P 1

P 2

P (^) m

λ 1 − ˆσ 2 λ 2 − ˆσ 2 .. . λ (^) m − ˆσ 2

(14.110)

where

ˆ e i =

[

1 e j^2 π^ fˆi e j^4 π^ fˆi · · · e j^2 π(N−^1 )^ fˆi^ ]T

. (14.111)

In summary, Pisarenko’s method consists of four steps:

  1. Estimate the (m + 1 ) × (m + 1 ) autocorrelation matrix R (provided it is known that the number of complex sinusoids is m).
  2. Evaluate the minimum eigenvalue λm+ 1 and the eigenvectors of R ˆ.
  3. Set the white noise power to σˆ 2 = λm+ 1 , estimate the frequencies of the complex sinusoids from the peak locations of PˆPHD(f ) in Eq. (14.109), and compute their powers from Eq. (14.110).
  4. Substitute the estimated parameters in Eq. (14.98).

Pisarenko’s method is not used frequently in practice because its performance is much poorer than the performance of some other signal and noise subspace based methods developed later.

14.5.5 Multiple Signal Classification (MUSIC)

A procedure very similar to Pisarenko’s is the MUltiple SIgnal Classification (MUSIC) method, which was proposed in the late 1970’s by Schmidt [18]. Suppose again that the process { ˜x[n]} is described by m complex sinusoids in white noise. If we form an M × M autocorrelation matrix R , find its eigenvalues and eigenvectors and rank them as before, then as mentioned in the previous subsection, its m eigenvectors corresponding to the m largest eigenvalues span the signal subspace, and the remaining eigenvectors, the noise subspace. According to MUSIC, we estimate the noise variance from the M − m smallest eigenvalues of R ˆ

σ ˆ 2 =

M − m

∑M

i=m+ 1

λi (14.112)

and the frequencies from the peak locations of the pseudospectrum

P^ ˆMU(f ) = (^) ∑^1 M i=m+ 1 | e (f ) H (^) v i | 2.^ (14.113)

It should be noted that there are other ways of estimating the fi ’s. Finally the powers of the com- plex sinusoids are determined from Eq. (14.110), and all the estimated parameters substituted in Eq. (14.98). MUSIC has better performance than Pisarenko’s method because of the introduced averaging via the extra noise eigenvectors. The averaging reduces the statistical fluctuations present in Pisarenko’s pseudospectrum, which arise due to the errors in estimating the autocorrelation matrix. These fluctuations can further be reduced by applying the Eigenvector method [11], which is a modification of MUSIC and whose pseudospectrum is given by

P^ ˆEV(f ) = (^) ∑^1 M i=m+ 1 |^

1 λi e (f ) H (^) v i | 2

Pisarenko’s method, MUSIC, and its variants exploit the noise subspace to estimate the unknown parameters of the random process. There are, however, approaches that estimate the unknown parameters from vectors that lie in the signal subspace. The main idea there is to form a reduced rank autocorrelation matrix which is an estimate of the signal autocorrelation matrix. Since this estimate is formed from the m principal eigenvectors and eigenvalues, the methods based on them are called principal component spectrum estimation methods [8, 12]. Once the signal autocorrelation matrix is obtained, the frequencies of the complex sinusoids are found, followed by estimation of the remaining unknown parameters of the model.