Introduction
How well can the next sample of a signal be predicted from a linear combination of its past samples? This question lies at the heart of linear prediction. From the 1970s, Atal and Itakura applied it to speech analysis and coding, establishing what we now call LPC (Linear Predictive Coding). LPC remains a foundational technology behind speech codecs (GSM, CELP, AMR), formant estimation, vocal-tract modelling, and speech synthesis (e.g. the Klatt synthesizer).
The key idea of LPC is to model the signal as the output of an all-pole AR (autoregressive) filter and estimate the AR coefficients from the observed signal. The estimation problem reduces naturally from least squares to the Yule-Walker equations, whose Toeplitz structure can be solved in \(O(p^2)\) via the Levinson-Durbin algorithm.
This article covers the formulation of the linear-prediction model, AR-coefficient estimation, the derivation and Python implementation of Levinson-Durbin, and applications such as formant estimation and the LPC cepstrum. For prerequisites on spectral analysis and autocorrelation, see Autocorrelation and Cross-Correlation: Theory and Python Implementation and Fast Fourier Transform (FFT): Mechanism and Python Implementation.
Linear Prediction Model
Prediction Equation and Residual
In a linear prediction model of order \(p\) , the current sample \(x[n]\) is predicted as a linear combination of the previous \(p\) samples:
\[\hat{x}[n] = -\sum_{k=1}^{p} a_k\, x[n-k] \tag{1}\]The \(\{a_k\}_{k=1}^{p}\) are the LPC (prediction) coefficients; the negative sign follows the AR-model convention. The prediction error (residual) \(e[n]\) is
\[e[n] = x[n] - \hat{x}[n] = x[n] + \sum_{k=1}^{p} a_k\, x[n-k]. \tag{2}\]Rearranged, the signal \(x[n]\) is the output of an all-pole filter driven by the residual \(e[n]\) :
\[x[n] = -\sum_{k=1}^{p} a_k\, x[n-k] + e[n]. \tag{3}\]This is precisely the AR(p) model, with transfer function
\[H(z) = \frac{1}{A(z)} = \frac{1}{1 + \sum_{k=1}^{p} a_k z^{-k}}. \tag{4}\]\(A(z)\) is called the prediction-error filter. In speech, \(H(z)\) corresponds to the vocal-tract transfer function and \(e[n]\) to the excitation (glottal pulses or noise).
Least-Squares Coefficient Estimation
The LPC coefficients minimize the sum of squared residuals,
\[J = \sum_{n} e[n]^2 = \sum_n \left( x[n] + \sum_{k=1}^{p} a_k\, x[n-k] \right)^2. \tag{5}\]Setting \(\partial J / \partial a_i = 0\) ,
\[\frac{\partial J}{\partial a_i} = 2 \sum_n \left( x[n] + \sum_{k=1}^{p} a_k\, x[n-k] \right) x[n-i] = 0, \tag{6}\]and rearranging gives, for \(i = 1, \ldots, p\) ,
\[\sum_{k=1}^{p} a_k \sum_n x[n-k]\, x[n-i] = -\sum_n x[n]\, x[n-i]. \tag{7}\]These are the normal equations.
Yule-Walker Equations
Autocorrelation Form
If the summation extends over all samples (e.g. via windowing or stationarity), \(\sum_n x[n-k]\,x[n-i]\) equals the autocorrelation at lag \(|i - k|\) :
\[r[m] = \sum_n x[n]\, x[n+m]. \tag{8}\]Substituting into Eq. (7) yields, for \(i = 1, \ldots, p\) ,
\[\sum_{k=1}^{p} a_k\, r[|i-k|] = -r[i]. \tag{9}\]These are the Yule-Walker equations. In matrix form,
\[ \begin{bmatrix} r[0] & r[1] & \cdots & r[p-1] \\ r[1] & r[0] & \cdots & r[p-2] \\ \vdots & \vdots & \ddots & \vdots \\ r[p-1] & r[p-2] & \cdots & r[0] \end{bmatrix} \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_p \end{bmatrix} = - \begin{bmatrix} r[1] \\ r[2] \\ \vdots \\ r[p] \end{bmatrix}. \tag{10} \]The left-hand-side matrix is symmetric and Toeplitz: every entry on the main diagonal equals \(r[0]\) , every entry on the next diagonal equals \(r[1]\) , and so on. This special structure is the key to the fast solver below.
Residual Power
Substituting the optimal coefficients into Eq. (5) gives the minimum residual power
\[E_p = r[0] + \sum_{k=1}^{p} a_k\, r[k]. \tag{11}\]\(E_p\) is monotonically non-increasing in \(p\) . Increasing \(p\) improves prediction but may overfit and start tracking glottal harmonics, so a common rule of thumb in speech is \(p \approx f_s / 1000 + 2 \text{ to } 4\) (e.g. \(p = 10\text{--}12\) at 8 kHz).
Levinson-Durbin Algorithm
Motivation
A general Gauss elimination on the \(p \times p\) system in Eq. (10) is \(O(p^3)\) , but exploiting the Toeplitz structure reduces this to \(O(p^2)\) . The recursion also constructs higher-order solutions from lower-order ones, which is convenient for model-order selection. This is the Levinson-Durbin recursion.
Recursion
Let \(a^{(m)}_k\) denote the LPC coefficients at order \(m\) , and \(E_m\) the corresponding residual power. Initialize with
\[E_0 = r[0],\qquad a^{(0)} = \emptyset. \tag{12}\]Updating from order \(m\) to \(m+1\) goes through the reflection coefficient (PARCOR) \(k_{m+1}\) :
\[k_{m+1} = -\frac{r[m+1] + \sum_{j=1}^{m} a^{(m)}_j\, r[m+1-j]}{E_m}. \tag{13}\]The new coefficients are
\[a^{(m+1)}_{m+1} = k_{m+1}, \tag{14}\] \[a^{(m+1)}_j = a^{(m)}_j + k_{m+1}\, a^{(m)}_{m+1-j}, \quad j = 1, \ldots, m. \tag{15}\]The residual power updates as
\[E_{m+1} = (1 - k_{m+1}^2)\, E_m. \tag{16}\]The fact that \(|k_{m+1}| < 1\) at every step guarantees the minimum-phase property of \(A(z)\) , and hence the stability of the all-pole synthesis filter \(H(z) = 1/A(z)\) .
Physical Meaning of Reflection Coefficients
The reflection coefficient \(k_m\) corresponds to the boundary reflection ratio of a concatenation of uniform-section acoustic tubes modelling the vocal tract. This shows that LPC is more than a statistical model — it is tied to the physical structure of the vocal apparatus. \(|k_m| \to 1\) is total reflection; \(|k_m| \to 0\) corresponds to no reflection (a continuous tube).
Python Implementation
Levinson-Durbin from Autocorrelation
We implement Levinson-Durbin from scratch with NumPy and verify it on a known AR process.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import lfilter
def autocorrelate(x, p):
"""
Compute autocorrelation at lags 0..p (biased estimator, no 1/N).
"""
N = len(x)
n_fft = 2 ** int(np.ceil(np.log2(2 * N)))
X = np.fft.fft(x, n=n_fft)
r = np.fft.ifft(np.abs(X) ** 2).real[: p + 1]
return r
def levinson_durbin(r, p):
"""
Solve the Yule-Walker equations using the Levinson-Durbin recursion.
Parameters
----------
r : np.ndarray
Autocorrelation r[0], r[1], ..., r[p].
p : int
Model order.
Returns
-------
a : np.ndarray
LPC coefficients [a_1, ..., a_p] (a_0 = 1 is not included).
E : float
Minimum residual power.
k : np.ndarray
Reflection (PARCOR) coefficients [k_1, ..., k_p].
"""
a = np.zeros(p)
k_arr = np.zeros(p)
E = r[0]
if E <= 0:
return a, E, k_arr
for m in range(p):
# Reflection coefficient k_{m+1}
acc = r[m + 1]
for j in range(m):
acc += a[j] * r[m - j]
k = -acc / E
# Coefficient update (snapshot before mutation)
a_new = a.copy()
a_new[m] = k
for j in range(m):
a_new[j] = a[j] + k * a[m - 1 - j]
a = a_new
# Residual power
E = (1.0 - k * k) * E
k_arr[m] = k
return a, E, k_arr
def lpc(signal, p):
"""Convenience wrapper: estimate LPC coefficients directly from the signal."""
r = autocorrelate(signal, p)
return levinson_durbin(r, p)
# --- Verification: generate a known AR(4) and recover its coefficients ---
np.random.seed(42)
true_a = np.array([1.0, -2.2, 2.8, -1.8, 0.5]) # A(z) = 1 + a_1 z^-1 + ...
N = 4000
e = np.random.randn(N) # white excitation
x = lfilter([1.0], true_a, e) # AR(4) process
p = 4
a_est, E, k = lpc(x, p)
print("True AR coefficients (a_1..a_p):", true_a[1:])
print("Estimated AR coefficients :", a_est)
print(f"Reflection coefficients: {k}")
print(f"Minimum residual power: {E:.4f}")
The estimated coefficients match the true values within sampling fluctuation, and every reflection coefficient satisfies \(|k_m| < 1\) .
Comparison with scipy.linalg.solve_toeplitz
For pedagogical clarity the hand-coded version above is useful, but in practice scipy.linalg.solve_toeplitz is more concise.
import numpy as np
from scipy.linalg import solve_toeplitz
def lpc_toeplitz(signal, p):
"""
Reference LPC implementation using a Toeplitz solver.
Returns the same solution as Levinson-Durbin.
"""
r = autocorrelate(signal, p)
a = solve_toeplitz(r[:p], -r[1 : p + 1])
E = r[0] + np.dot(a, r[1 : p + 1])
return a, E
a_ref, E_ref = lpc_toeplitz(x, p)
print("Toeplitz-solver coefficients:", a_ref)
print("Difference vs. Levinson-Durbin:", np.max(np.abs(a_ref - a_est)))
The two solutions agree to floating-point precision (around \(10^{-12}\) ).
LPC Spectral Envelope
A canonical LPC visualization overlays the magnitude response of the all-pole filter \(H(z) = 1/A(z)\) (the LPC spectrum) on the raw signal spectrum. The LPC spectrum smoothly tracks the envelope of the raw spectrum while ignoring the harmonic fine structure.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import lfilter, freqz
def synth_vowel(fs=8000, duration=0.04, f0=140, ar=None, noise=0.02, seed=0):
"""Synthetic vowel: glottal pulse train through an AR vocal-tract filter."""
np.random.seed(seed)
t = np.arange(0, duration, 1 / fs)
glottal = np.zeros_like(t)
period = int(fs / f0)
for k in range(0, len(t), period):
L = min(30, len(t) - k)
glottal[k : k + L] = np.exp(-np.arange(L) * 0.15)
voice = lfilter([1.0], ar, glottal)
voice = voice / (np.max(np.abs(voice)) + 1e-8)
voice += noise * np.random.randn(len(voice))
return t, voice
# Vowel "a"-like AR(4): formants F1≈800, F2≈1200 Hz
fs = 8000
ar_true = [1.0, -2.2, 2.8, -1.8, 0.5]
t, voice = synth_vowel(fs=fs, duration=0.04, f0=140, ar=ar_true, noise=0.02)
# LPC estimation (windowed to reduce spectral leakage)
window = np.hamming(len(voice))
a_est, E, _ = lpc(voice * window, p=12)
A = np.concatenate(([1.0], a_est)) # A(z) = 1 + a_1 z^-1 + ... + a_p z^-p
# Raw spectrum
n_fft = 1024
X = np.fft.rfft(voice * window, n=n_fft)
freqs = np.fft.rfftfreq(n_fft, d=1 / fs)
log_spec = 20 * np.log10(np.abs(X) + 1e-10)
# LPC spectrum: H(e^{jw}) = sqrt(E) / A(e^{jw})
w, H = freqz([np.sqrt(E)], A, worN=n_fft // 2 + 1, fs=fs)
lpc_spec_db = 20 * np.log10(np.abs(H) + 1e-10)
fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(freqs, log_spec, color="steelblue", alpha=0.6, label="Log spectrum (windowed)")
ax.plot(w, lpc_spec_db, color="red", linewidth=2, label="LPC envelope (p=12)")
ax.set_xlabel("Frequency [Hz]")
ax.set_ylabel("Magnitude [dB]")
ax.set_title("LPC Spectral Envelope vs. Raw Spectrum")
ax.set_xlim(0, fs / 2)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
The LPC spectrum traces the formant peaks smoothly while ignoring the glottal harmonic ripple.
Formant Estimation
Formants correspond to the roots (poles) of \(A(z)\) . For a pole \(z = r e^{j\theta}\) with \(r\) close to the unit circle, the resonance has
\[F = \frac{\theta}{2\pi}\, f_s,\qquad \mathrm{BW} = -\frac{\ln r}{\pi}\, f_s. \tag{17}\]import numpy as np
def lpc_formants(a, fs, max_bw=400.0, min_freq=90.0):
"""
Extract formant frequencies and bandwidths from LPC coefficients.
Parameters
----------
a : np.ndarray
LPC coefficients [a_1, ..., a_p] (a_0 = 1 not included).
fs : float
Sampling frequency [Hz].
max_bw : float
Maximum bandwidth [Hz] for a pole to be counted as a formant.
min_freq : float
Minimum frequency [Hz] for a pole to be counted as a formant.
Returns
-------
formants : list of (freq_Hz, bandwidth_Hz)
"""
A = np.concatenate(([1.0], a))
roots = np.roots(A)
# Keep only roots in the upper half-plane (avoid conjugate duplicates)
roots = roots[np.imag(roots) > 0]
formants = []
for r in roots:
theta = np.arctan2(np.imag(r), np.real(r))
magnitude = np.abs(r)
f = theta / (2 * np.pi) * fs
bw = -np.log(max(magnitude, 1e-10)) / np.pi * fs
if f >= min_freq and bw <= max_bw:
formants.append((f, bw))
return sorted(formants, key=lambda x: x[0])
formants = lpc_formants(a_est, fs)
for i, (f, bw) in enumerate(formants[:4], start=1):
print(f"F{i}: {f:7.1f} Hz (BW = {bw:5.1f} Hz)")
The lower formants F1 and F2 are the dominant perceptual cues for vowel identity, and the classical F1-F2 plane is widely used to separate vowels.
Relation to the Cepstrum: LPC Cepstrum
There is a direct recursion (Schroeder’s formula) that converts LPC coefficients to cepstral coefficients \(c[m]\) . With \(a_0 = 1\) ,
\[c[1] = -a_1, \tag{18}\] \[c[m] = -a_m - \sum_{k=1}^{m-1} \frac{k}{m}\, c[k]\, a_{m-k}, \qquad m = 2, 3, \ldots \tag{19}\]LPC cepstral coefficients are linear in the log-spectral domain, which makes distance computations stable, and they have long been used as features for speech and speaker recognition. Compared to the FFT-based cepstrum (see Cepstrum Analysis: Theory and Python Implementation), the LPC cepstrum gives a smoother representation of the low-quefrency (vocal-tract) region.
import numpy as np
def lpc_to_cepstrum(a, n_cep):
"""
Convert LPC coefficients [a_1, ..., a_p] to LPC cepstrum c[1..n_cep].
"""
p = len(a)
c = np.zeros(n_cep)
for m in range(1, n_cep + 1):
if m <= p:
c[m - 1] = -a[m - 1]
# Recursive term
s = 0.0
for k_ in range(1, m):
if (m - k_) <= p:
s += (k_ / m) * c[k_ - 1] * a[m - k_ - 1]
c[m - 1] -= s
return c
cep = lpc_to_cepstrum(a_est, n_cep=20)
print("LPC cepstrum c[1..10]:", cep[:10])
Practical Considerations
| Item | Recommendation | ||
|---|---|---|---|
| Sampling frequency | 8 kHz (telephone band) or 16 kHz (wideband speech) | ||
| Model order \(p\) | \(p \approx f_s/1000 + 2 \text{ to } 4\) (10–12 at 8 kHz) | ||
| Frame length | 20–30 ms (within which stationarity is approximately valid) | ||
| Window | Hamming / Hanning (suppresses spurious poles from edge jumps) | ||
| Pre-emphasis | \(y[n] = x[n] - 0.95\, x[n-1]\) (flattens spectral tilt) | ||
| Stability | $ | k_m | < 1\( from Levinson-Durbin guarantees stable \) 1/A(z)$ |
| Excessive order | Tracks glottal harmonics → spurious formants | ||
| Non-stationary intervals | Plosives and boundaries → re-segment frames |
Pre-emphasis is especially important: it compensates the roughly \(-12\) dB/octave glottal source tilt, prevents the poles from being biased toward low frequencies, and improves the accuracy of formant bandwidth estimates.
Summary
- The linear-prediction model predicts the current sample from a linear combination of \(p\) past samples; it is an all-pole AR model whose residual is whitened.
- Minimizing the sum of squared residuals leads to the normal equations, which in autocorrelation form become the Yule-Walker equations with a symmetric Toeplitz matrix.
- The Levinson-Durbin algorithm updates coefficients order by order through the reflection coefficient \(k_m\) , solving the system in \(O(p^2)\) .
- Reflection coefficients always satisfy \(|k_m| < 1\) , and they correspond physically to boundary reflection ratios in an acoustic-tube model of the vocal tract.
- The LPC spectrum \(1/|A(e^{j\omega})|\) captures the spectral envelope while ignoring glottal harmonics, and the poles of \(A(z)\) correspond to formants.
- Schroeder’s recursion converts LPC coefficients into the LPC cepstrum, a long-standing feature for speech and speaker recognition.
- In practice, pre-emphasis, an appropriate window, and the rule of thumb \(p \approx f_s/1000 + 2 \text{ to } 4\) for the model order are decisive for quality.
Related Articles
- Autocorrelation and Cross-Correlation: Theory and Python Implementation — defines the autocorrelation that fills both sides of the Yule-Walker equations and presents the FFT-based fast computation.
- Cepstrum Analysis: Theory and Python Implementation — the FFT-based cepstrum that complements the LPC cepstrum from Schroeder’s recursion.
- Fast Fourier Transform (FFT): Mechanism and Python Implementation — the FFT/DFT used to visualize LPC spectral envelopes and to compute autocorrelation efficiently.
- Window Functions and Power Spectral Density (PSD): Theory and Python Implementation — windowing that precedes LPC estimation and PSD estimation via Welch’s method.
- Wiener Filter: Theory and Python Implementation — another classical least-squares-derived linear filter, closely related to the LPC formulation.
- Time-Series Forecasting with ARIMA Models — applications of the same AR family from the time-series forecasting side.
References
- Atal, B. S., & Hanauer, S. L. (1971). “Speech analysis and synthesis by linear prediction of the speech wave.” Journal of the Acoustical Society of America, 50(2B), 637-655.
- Itakura, F., & Saito, S. (1968). “Analysis synthesis telephony based on the maximum likelihood method.” Proceedings of the 6th International Congress on Acoustics, C-17-20.
- Makhoul, J. (1975). “Linear prediction: A tutorial review.” Proceedings of the IEEE, 63(4), 561-580.
- Markel, J. D., & Gray, A. H. (1976). Linear Prediction of Speech. Springer-Verlag.
- Rabiner, L. R., & Schafer, R. W. (2010). Theory and Applications of Digital Speech Processing. Pearson.
- SciPy linalg.solve_toeplitz documentation
- Librosa: librosa.lpc