Adaptive Filters (LMS/RLS): Theory and Python Implementation

A comprehensive guide to adaptive filters covering LMS, NLMS, and RLS algorithms with mathematical derivations, Python implementations, and a practical noise cancellation example.

What Are Adaptive Filters?

An adaptive filter automatically adjusts its coefficients to track changes in the statistical properties of input signals. Unlike fixed-parameter filters such as EMA or Butterworth filters, adaptive filters can perform optimally in unknown or dynamically changing environments.

Key Applications

  • Noise cancellation: Removing environmental noise captured by microphones (ANC headphones)
  • Echo cancellation: Eliminating echoes in phone calls and video conferences
  • System identification: Estimating the transfer function of an unknown system
  • Channel equalization: Compensating for transmission path distortion in communications

Relationship to the Wiener Filter

The theoretical foundation of adaptive filters is the Wiener filter. The Wiener filter minimizes the mean squared error (MSE) when estimating a desired signal \(d(n)\) from an input signal \(\mathbf{x}(n)\).

With filter output \(y(n) = \mathbf{w}^T \mathbf{x}(n)\) and error \(e(n) = d(n) - y(n)\), the MSE is:

\[ J(\mathbf{w}) = E[e^2(n)] = E[(d(n) - \mathbf{w}^T \mathbf{x}(n))^2] \tag{1} \]

The optimal solution (Wiener-Hopf solution) is:

\[ \mathbf{w}^* = \mathbf{R}^{-1} \mathbf{p} \tag{2} \]

where \(\mathbf{R} = E[\mathbf{x}(n)\mathbf{x}^T(n)]\) is the input autocorrelation matrix and \(\mathbf{p} = E[\mathbf{x}(n)d(n)]\) is the cross-correlation vector. Since \(\mathbf{R}\) and \(\mathbf{p}\) are typically unknown or time-varying, adaptive algorithms iteratively approach the Wiener solution.

LMS (Least Mean Squares) Algorithm

The LMS algorithm approximates the MSE gradient using instantaneous estimates — a form of stochastic gradient descent, sharing the same principles as SGD.

Derivation

The gradient of \(J(\mathbf{w})\) is:

\[ \nabla J(\mathbf{w}) = -2E[\mathbf{x}(n)e(n)] \tag{3} \]

LMS replaces the expectation with instantaneous values:

\[ \hat{\nabla} J(\mathbf{w}) = -2\mathbf{x}(n)e(n) \tag{4} \]

The update rule becomes:

\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \mu \mathbf{x}(n) e(n) \tag{5} \]

where \(\mu\) is the step size (learning rate).

Convergence Condition

For LMS to converge, the step size must satisfy:

\[ 0 < \mu < \frac{2}{\lambda_{\max}} \tag{6} \]

where \(\lambda_{\max}\) is the largest eigenvalue of \(\mathbf{R}\). In practice:

\[ 0 < \mu < \frac{2}{M \cdot \sigma_x^2} \tag{7} \]

where \(M\) is the filter order and \(\sigma_x^2\) is the input signal power.

NLMS (Normalized LMS) Algorithm

LMS’s convergence depends on input signal power. NLMS normalizes by the input vector norm:

\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \frac{\mu}{\|\mathbf{x}(n)\|^2 + \epsilon} \mathbf{x}(n) e(n) \tag{8} \]

where \(\epsilon\) is a small positive constant to prevent division by zero. With NLMS, convergence is guaranteed for \(0 < \mu < 2\).

RLS (Recursive Least Squares) Algorithm

RLS minimizes the weighted least squares error over all past data. It converges faster than LMS but has higher computational cost.

Cost Function

\[ J(\mathbf{w}, n) = \sum_{i=1}^{n} \lambda^{n-i} |e(i)|^2 \tag{9} \]

where \(\lambda\) (\(0 < \lambda \le 1\)) is the forgetting factor that exponentially reduces the influence of older data.

Update Algorithm

  1. Gain vector computation:
\[ \mathbf{k}(n) = \frac{\mathbf{P}(n-1)\mathbf{x}(n)}{\lambda + \mathbf{x}^T(n)\mathbf{P}(n-1)\mathbf{x}(n)} \tag{10} \]
  1. Error computation:
\[ e(n) = d(n) - \mathbf{w}^T(n-1)\mathbf{x}(n) \tag{11} \]
  1. Coefficient update:
\[ \mathbf{w}(n) = \mathbf{w}(n-1) + \mathbf{k}(n) e(n) \tag{12} \]
  1. Inverse correlation matrix update:
\[ \mathbf{P}(n) = \frac{1}{\lambda}\left[\mathbf{P}(n-1) - \mathbf{k}(n)\mathbf{x}^T(n)\mathbf{P}(n-1)\right] \tag{13} \]

Initialize with \(\mathbf{P}(0) = \delta^{-1}\mathbf{I}\) where \(\delta\) is a small positive constant.

Algorithm Comparison

PropertyLMSNLMSRLS
Complexity (per step)\(O(M)\)\(O(M)\)\(O(M^2)\)
Memory\(O(M)\)\(O(M)\)\(O(M^2)\)
Convergence speedSlowModerateFast
Steady-state errorStep-size dependentStep-size dependentSmall
Numerical stabilityHighHighModerate
Tuning parameters\(\mu\)\(\mu\)\(\lambda\), \(\delta\)
Best forReal-time processingVarying input powerFast convergence needed

Python Implementation

LMS / NLMS

import numpy as np

def lms_filter(x, d, M, mu):
    """Adaptive filter using LMS algorithm

    Parameters
    ----------
    x : array_like
        Input signal (reference signal)
    d : array_like
        Desired signal
    M : int
        Filter order
    mu : float
        Step size

    Returns
    -------
    y : ndarray
        Filter output
    e : ndarray
        Error signal
    w_history : ndarray
        Filter coefficient history
    """
    N = len(x)
    w = np.zeros(M)
    y = np.zeros(N)
    e = np.zeros(N)
    w_history = np.zeros((N, M))

    for n in range(M, N):
        x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
        y[n] = np.dot(w, x_vec)
        e[n] = d[n] - y[n]
        w = w + mu * e[n] * x_vec
        w_history[n] = w

    return y, e, w_history

def nlms_filter(x, d, M, mu=0.5, eps=1e-8):
    """Adaptive filter using NLMS algorithm"""
    N = len(x)
    w = np.zeros(M)
    y = np.zeros(N)
    e = np.zeros(N)

    for n in range(M, N):
        x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
        y[n] = np.dot(w, x_vec)
        e[n] = d[n] - y[n]
        norm = np.dot(x_vec, x_vec) + eps
        w = w + (mu / norm) * e[n] * x_vec

    return y, e

RLS

def rls_filter(x, d, M, lam=0.99, delta=1.0):
    """Adaptive filter using RLS algorithm

    Parameters
    ----------
    x : array_like
        Input signal
    d : array_like
        Desired signal
    M : int
        Filter order
    lam : float
        Forgetting factor (0 < lam <= 1)
    delta : float
        Initial scaling for inverse correlation matrix

    Returns
    -------
    y : ndarray
        Filter output
    e : ndarray
        Error signal
    """
    N = len(x)
    w = np.zeros(M)
    P = np.eye(M) / delta
    y = np.zeros(N)
    e = np.zeros(N)

    for n in range(M, N):
        x_vec = x[n:n-M:-1] if n >= M else np.pad(x[:n+1][::-1], (0, M-n-1))
        y[n] = np.dot(w, x_vec)
        e[n] = d[n] - y[n]

        # Gain vector
        Px = P @ x_vec
        k = Px / (lam + x_vec @ Px)

        # Update coefficients
        w = w + k * e[n]

        # Update inverse correlation matrix
        P = (P - np.outer(k, x_vec @ P)) / lam

    return y, e

Practical Example: Noise Cancellation

A classic application of adaptive filters is noise cancellation.

import matplotlib.pyplot as plt

np.random.seed(42)

# Signal generation
N = 1000
t = np.arange(N)
signal = np.sin(2 * np.pi * 0.01 * t)  # Desired signal
noise_ref = np.random.randn(N)  # Reference noise

# Noise passed through an unknown system (simulated with FIR filter)
h_true = np.array([0.5, -0.3, 0.2, -0.1])
noise_filtered = np.convolve(noise_ref, h_true, mode='full')[:N]

# Observed signal = desired signal + filtered noise
observed = signal + noise_filtered

# Noise removal with each algorithm
M = 8  # Filter order
y_lms, e_lms, _ = lms_filter(noise_ref, observed, M, mu=0.01)
y_nlms, e_nlms = nlms_filter(noise_ref, observed, M, mu=0.5)
y_rls, e_rls = rls_filter(noise_ref, observed, M, lam=0.99)

# Plot
fig, axes = plt.subplots(4, 1, figsize=(12, 10), sharex=True)

axes[0].plot(t, signal, 'g', label='Original signal')
axes[0].plot(t, observed, 'r', alpha=0.5, label='Observed (signal + noise)')
axes[0].set_title('Input Signals')
axes[0].legend()

axes[1].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[1].plot(t, e_lms, 'b', label='LMS output')
axes[1].set_title('LMS Noise Cancellation')
axes[1].legend()

axes[2].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[2].plot(t, e_nlms, 'b', label='NLMS output')
axes[2].set_title('NLMS Noise Cancellation')
axes[2].legend()

axes[3].plot(t, signal, 'g', alpha=0.5, label='Original')
axes[3].plot(t, e_rls, 'b', label='RLS output')
axes[3].set_title('RLS Noise Cancellation')
axes[3].legend()

plt.tight_layout()
plt.show()

In noise cancellation, the error signal \(e(n)\) corresponds to the noise-removed signal. RLS converges faster than LMS and achieves better noise removal performance from the initial stage.

Learning Curve Visualization

Compare convergence speeds by plotting the learning curves (MSE over time).

# Learning curves (moving average of MSE)
window = 50
mse_lms = np.convolve(e_lms**2, np.ones(window)/window, mode='valid')
mse_nlms = np.convolve(e_nlms**2, np.ones(window)/window, mode='valid')
mse_rls = np.convolve(e_rls**2, np.ones(window)/window, mode='valid')

plt.figure(figsize=(10, 5))
plt.semilogy(mse_lms, label='LMS')
plt.semilogy(mse_nlms, label='NLMS')
plt.semilogy(mse_rls, label='RLS')
plt.xlabel('Sample')
plt.ylabel('MSE')
plt.title('Learning Curves')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

References

  • Haykin, S. (2014). Adaptive Filter Theory (5th ed.). Pearson.
  • Widrow, B., & Stearns, S. D. (1985). Adaptive Signal Processing. Prentice-Hall.
  • Sayed, A. H. (2008). Adaptive Filters. Wiley-IEEE Press.