What Are Adaptive Filters?
An adaptive filter automatically adjusts its coefficients to track changes in the statistical properties of input signals. Unlike fixed-parameter filters such as EMA or Butterworth filters, adaptive filters can perform optimally in unknown or dynamically changing environments.
Key Applications
- Noise cancellation: Removing environmental noise captured by microphones (ANC headphones)
- Echo cancellation: Eliminating echoes in phone calls and video conferences
- System identification: Estimating the transfer function of an unknown system
- Channel equalization: Compensating for transmission path distortion in communications
Relationship to the Wiener Filter
The theoretical foundation of adaptive filters is the Wiener filter. The Wiener filter minimizes the mean squared error (MSE) when estimating a desired signal \(d(n)\) from an input signal \(\mathbf{x}(n)\).
With filter output \(y(n) = \mathbf{w}^T \mathbf{x}(n)\) and error \(e(n) = d(n) - y(n)\), the MSE is:
\[ J(\mathbf{w}) = E[e^2(n)] = E[(d(n) - \mathbf{w}^T \mathbf{x}(n))^2] \tag{1} \]The optimal solution (Wiener-Hopf solution) is:
\[ \mathbf{w}^* = \mathbf{R}^{-1} \mathbf{p} \tag{2} \]where \(\mathbf{R} = E[\mathbf{x}(n)\mathbf{x}^T(n)]\) is the input autocorrelation matrix and \(\mathbf{p} = E[\mathbf{x}(n)d(n)]\) is the cross-correlation vector. Since \(\mathbf{R}\) and \(\mathbf{p}\) are typically unknown or time-varying, adaptive algorithms iteratively approach the Wiener solution.
LMS (Least Mean Squares) Algorithm
The LMS algorithm approximates the MSE gradient using instantaneous estimates — a form of stochastic gradient descent, sharing the same principles as SGD.
Derivation
The gradient of \(J(\mathbf{w})\) is:
\[ \nabla J(\mathbf{w}) = -2E[\mathbf{x}(n)e(n)] \tag{3} \]LMS replaces the expectation with instantaneous values:
\[ \hat{\nabla} J(\mathbf{w}) = -2\mathbf{x}(n)e(n) \tag{4} \]The update rule becomes:
\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \mu \mathbf{x}(n) e(n) \tag{5} \]where \(\mu\) is the step size (learning rate).
Convergence Condition
For LMS to converge, the step size must satisfy:
\[ 0 < \mu < \frac{2}{\lambda_{\max}} \tag{6} \]where \(\lambda_{\max}\) is the largest eigenvalue of \(\mathbf{R}\). In practice:
\[ 0 < \mu < \frac{2}{M \cdot \sigma_x^2} \tag{7} \]where \(M\) is the filter order and \(\sigma_x^2\) is the input signal power.
NLMS (Normalized LMS) Algorithm
LMS’s convergence depends on input signal power. NLMS normalizes by the input vector norm:
\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \frac{\mu}{\|\mathbf{x}(n)\|^2 + \epsilon} \mathbf{x}(n) e(n) \tag{8} \]where \(\epsilon\) is a small positive constant to prevent division by zero. With NLMS, convergence is guaranteed for \(0 < \mu < 2\).
RLS (Recursive Least Squares) Algorithm
RLS minimizes the weighted least squares error over all past data. It converges faster than LMS but has higher computational cost.
Cost Function
\[ J(\mathbf{w}, n) = \sum_{i=1}^{n} \lambda^{n-i} |e(i)|^2 \tag{9} \]where \(\lambda\) (\(0 < \lambda \le 1\)) is the forgetting factor that exponentially reduces the influence of older data.
Update Algorithm
- Gain vector computation:
- Error computation:
- Coefficient update:
- Inverse correlation matrix update:
Initialize with \(\mathbf{P}(0) = \delta^{-1}\mathbf{I}\) where \(\delta\) is a small positive constant.
Algorithm Comparison
| Property | LMS | NLMS | RLS |
|---|---|---|---|
| Complexity (per step) | \(O(M)\) | \(O(M)\) | \(O(M^2)\) |
| Memory | \(O(M)\) | \(O(M)\) | \(O(M^2)\) |
| Convergence speed | Slow | Moderate | Fast |
| Steady-state error | Step-size dependent | Step-size dependent | Small |
| Numerical stability | High | High | Moderate |
| Tuning parameters | \(\mu\) | \(\mu\) | \(\lambda\), \(\delta\) |
| Best for | Real-time processing | Varying input power | Fast convergence needed |
Python Implementation
LMS / NLMS
import numpy as np
def lms_filter(x, d, M, mu):
"""Adaptive filter using LMS algorithm
Parameters
----------
x : array_like
Input signal (reference signal)
d : array_like
Desired signal
M : int
Filter order
mu : float
Step size
Returns
-------
y : ndarray
Filter output
e : ndarray
Error signal
w_history : ndarray
Filter coefficient history
"""
N = len(x)
w = np.zeros(M)
y = np.zeros(N)
e = np.zeros(N)
w_history = np.zeros((N, M))
for n in range(M, N):
x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
y[n] = np.dot(w, x_vec)
e[n] = d[n] - y[n]
w = w + mu * e[n] * x_vec
w_history[n] = w
return y, e, w_history
def nlms_filter(x, d, M, mu=0.5, eps=1e-8):
"""Adaptive filter using NLMS algorithm"""
N = len(x)
w = np.zeros(M)
y = np.zeros(N)
e = np.zeros(N)
for n in range(M, N):
x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
y[n] = np.dot(w, x_vec)
e[n] = d[n] - y[n]
norm = np.dot(x_vec, x_vec) + eps
w = w + (mu / norm) * e[n] * x_vec
return y, e
RLS
def rls_filter(x, d, M, lam=0.99, delta=1.0):
"""Adaptive filter using RLS algorithm
Parameters
----------
x : array_like
Input signal
d : array_like
Desired signal
M : int
Filter order
lam : float
Forgetting factor (0 < lam <= 1)
delta : float
Initial scaling for inverse correlation matrix
Returns
-------
y : ndarray
Filter output
e : ndarray
Error signal
"""
N = len(x)
w = np.zeros(M)
P = np.eye(M) / delta
y = np.zeros(N)
e = np.zeros(N)
for n in range(M, N):
x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
y[n] = np.dot(w, x_vec)
e[n] = d[n] - y[n]
# Gain vector
Px = P @ x_vec
k = Px / (lam + x_vec @ Px)
# Update coefficients
w = w + k * e[n]
# Update inverse correlation matrix
P = (P - np.outer(k, x_vec @ P)) / lam
return y, e
Practical Example: Noise Cancellation
A classic application of adaptive filters is noise cancellation.
import matplotlib.pyplot as plt
np.random.seed(42)
# Signal generation
N = 1000
t = np.arange(N)
signal = np.sin(2 * np.pi * 0.01 * t) # Desired signal
noise_ref = np.random.randn(N) # Reference noise
# Noise passed through an unknown system (simulated with FIR filter)
h_true = np.array([0.5, -0.3, 0.2, -0.1])
noise_filtered = np.convolve(noise_ref, h_true, mode='full')[:N]
# Observed signal = desired signal + filtered noise
observed = signal + noise_filtered
# Noise removal with each algorithm
M = 8 # Filter order
y_lms, e_lms, _ = lms_filter(noise_ref, observed, M, mu=0.01)
y_nlms, e_nlms = nlms_filter(noise_ref, observed, M, mu=0.5)
y_rls, e_rls = rls_filter(noise_ref, observed, M, lam=0.99)
# Plot
fig, axes = plt.subplots(4, 1, figsize=(12, 10), sharex=True)
axes[0].plot(t, signal, 'g', label='Original signal')
axes[0].plot(t, observed, 'r', alpha=0.5, label='Observed (signal + noise)')
axes[0].set_title('Input Signals')
axes[0].legend()
axes[1].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[1].plot(t, e_lms, 'b', label='LMS output')
axes[1].set_title('LMS Noise Cancellation')
axes[1].legend()
axes[2].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[2].plot(t, e_nlms, 'b', label='NLMS output')
axes[2].set_title('NLMS Noise Cancellation')
axes[2].legend()
axes[3].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[3].plot(t, e_rls, 'b', label='RLS output')
axes[3].set_title('RLS Noise Cancellation')
axes[3].legend()
plt.tight_layout()
plt.show()
In noise cancellation, the error signal \(e(n)\) corresponds to the noise-removed signal. RLS converges faster than LMS and achieves better noise removal performance from the initial stage.
Learning Curve Visualization
Compare convergence speeds by plotting the learning curves (MSE over time).
# Learning curves (moving average of MSE)
window = 50
mse_lms = np.convolve(e_lms**2, np.ones(window)/window, mode='valid')
mse_nlms = np.convolve(e_nlms**2, np.ones(window)/window, mode='valid')
mse_rls = np.convolve(e_rls**2, np.ones(window)/window, mode='valid')
plt.figure(figsize=(10, 5))
plt.semilogy(mse_lms, label='LMS')
plt.semilogy(mse_nlms, label='NLMS')
plt.semilogy(mse_rls, label='RLS')
plt.xlabel('Sample')
plt.ylabel('MSE')
plt.title('Learning Curves')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
Related Articles
- Exponential Moving Average (EMA) Filter Frequency Response - A representative fixed-parameter filter. Understanding EMA helps contrast with adaptive approaches.
- Wiener Filter: Theory and Python Implementation - The theoretical foundation for adaptive filters, particularly LMS.
- Kalman Filter: Theory and Python Implementation - A sequential estimation algorithm closely related to RLS.
- FIR and IIR Filter Comparison - Most adaptive filters use FIR structure. Explains the characteristics of FIR vs IIR.
- Fundamentals of Filtering in Signal Processing - A systematic overview of Kalman filter, EKF, UKF, and particle filter.
- SGD and Adam: Theory and Comparison - Optimization algorithms in deep learning based on the same stochastic gradient descent framework as LMS.
- Moving Average Filter Types and Comparison - Representative fixed filters to contrast with adaptive filters.
- Low-Pass Filter Design and Comparison - Compares frequency responses of various low-pass filters.
- Unscented Kalman Filter (UKF): Theory and Python Implementation - A nonlinear extension of the Kalman filter, which can be seen as an extension of RLS.
References
- Haykin, S. (2014). Adaptive Filter Theory (5th ed.). Pearson.
- Widrow, B., & Stearns, S. D. (1985). Adaptive Signal Processing. Prentice-Hall.
- Sayed, A. H. (2008). Adaptive Filters. Wiley-IEEE Press.