Adaptive Filters (LMS/RLS): Theory and Python Implementation

What Are Adaptive Filters?

An adaptive filter automatically adjusts its coefficients to track changes in the statistical properties of input signals. Unlike fixed-parameter filters such as EMA or Butterworth filters, adaptive filters can perform optimally in unknown or dynamically changing environments.

Key Applications

Noise cancellation: Removing environmental noise captured by microphones (ANC headphones)
Echo cancellation: Eliminating echoes in phone calls and video conferences
System identification: Estimating the transfer function of an unknown system
Channel equalization: Compensating for transmission path distortion in communications

Relationship to the Wiener Filter

The theoretical foundation of adaptive filters is the Wiener filter. The Wiener filter minimizes the mean squared error (MSE) when estimating a desired signal \(d(n)\) from an input signal \(\mathbf{x}(n)\).

With filter output \(y(n) = \mathbf{w}^T \mathbf{x}(n)\) and error \(e(n) = d(n) - y(n)\), the MSE is:

\[ J(\mathbf{w}) = E[e^2(n)] = E[(d(n) - \mathbf{w}^T \mathbf{x}(n))^2] \tag{1} \]

The optimal solution (Wiener-Hopf solution) is:

\[ \mathbf{w}^* = \mathbf{R}^{-1} \mathbf{p} \tag{2} \]

where \(\mathbf{R} = E[\mathbf{x}(n)\mathbf{x}^T(n)]\) is the input autocorrelation matrix and \(\mathbf{p} = E[\mathbf{x}(n)d(n)]\) is the cross-correlation vector. Since \(\mathbf{R}\) and \(\mathbf{p}\) are typically unknown or time-varying, adaptive algorithms iteratively approach the Wiener solution.

LMS (Least Mean Squares) Algorithm

The LMS algorithm approximates the MSE gradient using instantaneous estimates — a form of stochastic gradient descent, sharing the same principles as SGD.

Derivation

The gradient of \(J(\mathbf{w})\) is:

\[ \nabla J(\mathbf{w}) = -2E[\mathbf{x}(n)e(n)] \tag{3} \]

LMS replaces the expectation with instantaneous values:

\[ \hat{\nabla} J(\mathbf{w}) = -2\mathbf{x}(n)e(n) \tag{4} \]

The update rule becomes:

\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \mu \mathbf{x}(n) e(n) \tag{5} \]

where \(\mu\) is the step size (learning rate).

Convergence Condition

For LMS to converge, the step size must satisfy:

\[ 0 < \mu < \frac{2}{\lambda_{\max}} \tag{6} \]

where \(\lambda_{\max}\) is the largest eigenvalue of \(\mathbf{R}\). In practice:

\[ 0 < \mu < \frac{2}{M \cdot \sigma_x^2} \tag{7} \]

where \(M\) is the filter order and \(\sigma_x^2\) is the input signal power.

NLMS (Normalized LMS) Algorithm

LMS’s convergence depends on input signal power. NLMS normalizes by the input vector norm:

\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \frac{\mu}{\|\mathbf{x}(n)\|^2 + \epsilon} \mathbf{x}(n) e(n) \tag{8} \]

where \(\epsilon\) is a small positive constant to prevent division by zero. With NLMS, convergence is guaranteed for \(0 < \mu < 2\).

RLS (Recursive Least Squares) Algorithm

RLS minimizes the weighted least squares error over all past data. It converges faster than LMS but has higher computational cost.

Cost Function

\[ J(\mathbf{w}, n) = \sum_{i=1}^{n} \lambda^{n-i} |e(i)|^2 \tag{9} \]

where \(\lambda\) (\(0 < \lambda \le 1\)) is the forgetting factor that exponentially reduces the influence of older data.

Update Algorithm

Gain vector computation:

\[ \mathbf{k}(n) = \frac{\mathbf{P}(n-1)\mathbf{x}(n)}{\lambda + \mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{x}(n)} \tag{10} \]

Error computation:

\[ e(n) = d(n) - \mathbf{w}^T(n-1)\mathbf{x}(n) \tag{11} \]

Coefficient update:

\[ \mathbf{w}(n) = \mathbf{w}(n-1) + \mathbf{k}(n) e(n) \tag{12} \]

Inverse correlation matrix update:

\[ \mathbf{P}(n) = \frac{1}{\lambda}\left[\mathbf{P}(n-1) - \mathbf{k}(n)\mathbf{x}^T(n)\mathbf{P}(n-1)\right] \tag{13} \]

Initialize with \(\mathbf{P}(0) = \delta^{-1}\mathbf{I}\) where \(\delta\) is a small positive constant.

Algorithm Comparison

Property	LMS	NLMS	RLS
Complexity (per step)	\(O(M)\)	\(O(M)\)	\(O(M^2)\)
Memory	\(O(M)\)	\(O(M)\)	\(O(M^2)\)
Convergence speed	Slow	Moderate	Fast
Steady-state error	Step-size dependent	Step-size dependent	Small
Numerical stability	High	High	Moderate
Tuning parameters	\(\mu\)	\(\mu\)	\(\lambda\), \(\delta\)
Best for	Real-time processing	Varying input power	Fast convergence needed

Python Implementation

LMS / NLMS

import numpy as np

def lms_filter(x, d, M, mu):
    """Adaptive filter using LMS algorithm

    Parameters
    ----------
    x : array_like
        Input signal (reference signal)
    d : array_like
        Desired signal
    M : int
        Filter order
    mu : float
        Step size

    Returns
    -------
    y : ndarray
        Filter output
    e : ndarray
        Error signal
    w_history : ndarray
        Filter coefficient history
    """
    N = len(x)
    w = np.zeros(M)
    y = np.zeros(N)
    e = np.zeros(N)
    w_history = np.zeros((N, M))

    for n in range(M, N):
        x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
        y[n] = np.dot(w, x_vec)
        e[n] = d[n] - y[n]
        w = w + mu * e[n] * x_vec
        w_history[n] = w

    return y, e, w_history

def nlms_filter(x, d, M, mu=0.5, eps=1e-8):
    """Adaptive filter using NLMS algorithm"""
    N = len(x)
    w = np.zeros(M)
    y = np.zeros(N)
    e = np.zeros(N)

    for n in range(M, N):
        x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
        y[n] = np.dot(w, x_vec)
        e[n] = d[n] - y[n]
        norm = np.dot(x_vec, x_vec) + eps
        w = w + (mu / norm) * e[n] * x_vec

    return y, e

RLS

def rls_filter(x, d, M, lam=0.99, delta=1.0):
    """Adaptive filter using RLS algorithm

    Parameters
    ----------
    x : array_like
        Input signal
    d : array_like
        Desired signal
    M : int
        Filter order
    lam : float
        Forgetting factor (0 < lam <= 1)
    delta : float
        Initial scaling for inverse correlation matrix

    Returns
    -------
    y : ndarray
        Filter output
    e : ndarray
        Error signal
    """
    N = len(x)
    w = np.zeros(M)
    P = np.eye(M) / delta
    y = np.zeros(N)
    e = np.zeros(N)

    for n in range(M, N):
        x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
        y[n] = np.dot(w, x_vec)
        e[n] = d[n] - y[n]

        # Gain vector
        Px = P @ x_vec
        k = Px / (lam + x_vec @ Px)

        # Update coefficients
        w = w + k * e[n]

        # Update inverse correlation matrix
        P = (P - np.outer(k, x_vec @ P)) / lam

    return y, e

Practical Example: Noise Cancellation

A classic application of adaptive filters is noise cancellation.

import matplotlib.pyplot as plt

np.random.seed(42)

# Signal generation
N = 1000
t = np.arange(N)
signal = np.sin(2 * np.pi * 0.01 * t)  # Desired signal
noise_ref = np.random.randn(N)  # Reference noise

# Noise passed through an unknown system (simulated with FIR filter)
h_true = np.array([0.5, -0.3, 0.2, -0.1])
noise_filtered = np.convolve(noise_ref, h_true, mode='full')[:N]

# Observed signal = desired signal + filtered noise
observed = signal + noise_filtered

# Noise removal with each algorithm
M = 8  # Filter order
y_lms, e_lms, _ = lms_filter(noise_ref, observed, M, mu=0.01)
y_nlms, e_nlms = nlms_filter(noise_ref, observed, M, mu=0.5)
y_rls, e_rls = rls_filter(noise_ref, observed, M, lam=0.99)

# Plot
fig, axes = plt.subplots(4, 1, figsize=(12, 10), sharex=True)

axes[0].plot(t, signal, 'g', label='Original signal')
axes[0].plot(t, observed, 'r', alpha=0.5, label='Observed (signal + noise)')
axes[0].set_title('Input Signals')
axes[0].legend()

axes[1].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[1].plot(t, e_lms, 'b', label='LMS output')
axes[1].set_title('LMS Noise Cancellation')
axes[1].legend()

axes[2].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[2].plot(t, e_nlms, 'b', label='NLMS output')
axes[2].set_title('NLMS Noise Cancellation')
axes[2].legend()

axes[3].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[3].plot(t, e_rls, 'b', label='RLS output')
axes[3].set_title('RLS Noise Cancellation')
axes[3].legend()

plt.tight_layout()
plt.show()

In noise cancellation, the error signal \(e(n)\) corresponds to the noise-removed signal. RLS converges faster than LMS and achieves better noise removal performance from the initial stage.

Learning Curve Visualization

Compare convergence speeds by plotting the learning curves (MSE over time).

# Learning curves (moving average of MSE)
window = 50
mse_lms = np.convolve(e_lms**2, np.ones(window)/window, mode='valid')
mse_nlms = np.convolve(e_nlms**2, np.ones(window)/window, mode='valid')
mse_rls = np.convolve(e_rls**2, np.ones(window)/window, mode='valid')

plt.figure(figsize=(10, 5))
plt.semilogy(mse_lms, label='LMS')
plt.semilogy(mse_nlms, label='NLMS')
plt.semilogy(mse_rls, label='RLS')
plt.xlabel('Sample')
plt.ylabel('MSE')
plt.title('Learning Curves')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Exponential Moving Average (EMA) Filter Frequency Response - A representative fixed-parameter filter. Understanding EMA helps contrast with adaptive approaches.
Wiener Filter: Theory and Python Implementation - The theoretical foundation for adaptive filters, particularly LMS.
Kalman Filter: Theory and Python Implementation - A sequential estimation algorithm closely related to RLS.
FIR and IIR Filter Comparison - Most adaptive filters use FIR structure. Explains the characteristics of FIR vs IIR.
Fundamentals of Filtering in Signal Processing - A systematic overview of Kalman filter, EKF, UKF, and particle filter.
SGD and Adam: Theory and Comparison - Optimization algorithms in deep learning based on the same stochastic gradient descent framework as LMS.
Moving Average Filter Types and Comparison - Representative fixed filters to contrast with adaptive filters.
Low-Pass Filter Design and Comparison - Compares frequency responses of various low-pass filters.
Unscented Kalman Filter (UKF): Theory and Python Implementation - A nonlinear extension of the Kalman filter, which can be seen as an extension of RLS.

References

Haykin, S. (2014). Adaptive Filter Theory (5th ed.). Pearson.
Widrow, B., & Stearns, S. D. (1985). Adaptive Signal Processing. Prentice-Hall.
Sayed, A. H. (2008). Adaptive Filters. Wiley-IEEE Press.