Adaptive Filters: LMS and NLMS Algorithms — Theory and Python Implementation

Introduction

The LMS (Least Mean Squares) algorithm iteratively approximates the Wiener-filter optimum using stochastic gradient descent. Its computational cost is just \(O(L)\) per sample, it is numerically stable, and it underpins echo cancellation, noise cancellation, channel equalization, and system identification in production systems.

The Wiener filter requires the input statistics — autocorrelation \(R_{yy}\) and cross-correlation \(R_{sy}\) — to be known. LMS removes that assumption: it updates the filter coefficients directly from observed samples, which is far more practical.

Problem Formulation

Consider a length-\(L\) FIR adaptive filter. At time \(n\) define the input vector and coefficient vector

\[ \mathbf{x}(n) = [x(n), x(n-1), \dots, x(n-L+1)]^\top \tag{1} \] \[ \mathbf{w}(n) = [w_0(n), w_1(n), \dots, w_{L-1}(n)]^\top \tag{2} \]

The filter output and error are

\[ y(n) = \mathbf{w}(n)^\top \mathbf{x}(n), \qquad e(n) = d(n) - y(n) \tag{3} \]

where \(d(n)\) is the desired (reference) signal. We minimize the expected squared error

\[ J(\mathbf{w}) = E\bigl[e(n)^2\bigr] \tag{4} \]

iteratively.

LMS Algorithm

Stochastic Gradient Approximation

The exact gradient of \(J(\mathbf{w})\) is

\[ \nabla J = -2\, E\bigl[e(n)\, \mathbf{x}(n)\bigr] \tag{5} \]

but the expectation is unknown in practice. The key idea of LMS is to replace the expectation with the instantaneous value:

\[ \hat{\nabla} J(n) = -2\, e(n)\, \mathbf{x}(n) \tag{6} \]

Substituting into steepest descent yields the LMS update rule:

\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \mu\, e(n)\, \mathbf{x}(n) \tag{7} \]

Here \(\mu > 0\) is the step size (learning rate); the factor of 2 is absorbed into \(\mu\) .

Convergence Conditions

For mean convergence (\(E[\mathbf{w}(n)] \to \mathbf{w}^\ast\) ), the step size must satisfy

\[ 0 < \mu < \frac{2}{\lambda_{\max}} \tag{8} \]

where \(\lambda_{\max}\) is the largest eigenvalue of the input autocorrelation matrix \(\mathbf{R} = E[\mathbf{x}(n)\mathbf{x}(n)^\top]\) . A more conservative practical bound uses the input power:

\[ 0 < \mu < \frac{2}{L \cdot P_x}, \qquad P_x = E\bigl[x(n)^2\bigr] \tag{9} \]

which follows from \(\lambda_{\max} \le \mathrm{tr}(\mathbf{R}) = L P_x\) .

The qualitative behavior:

Step size \(\mu\)	Behavior
Small	Slow convergence, low steady-state error
Moderate	Balanced trade-off
Large	Fast convergence, high steady-state error
Above bound	Diverges (error grows unboundedly)

The eigenvalue spread \(\chi(\mathbf{R}) = \lambda_{\max} / \lambda_{\min}\) governs convergence speed: highly colored inputs converge slowly. See Autocorrelation Function for details.

NLMS Algorithm

Why Normalize

LMS convergence couples \(\mu\) with input power \(P_x\) . With non-stationary inputs (e.g., speech), a fixed \(\mu\) is simultaneously too small in quiet periods and too large in loud ones, which compromises stability.

NLMS (Normalized LMS) normalizes the step by the input vector norm:

\[ \mathbf{w}(n+1) = \mathbf{w}(n) + \frac{\tilde{\mu}}{\lVert \mathbf{x}(n) \rVert^2 + \varepsilon}\, e(n)\, \mathbf{x}(n) \tag{10} \]

The normalized step \(\tilde{\mu} \in (0, 2)\) no longer depends on input power. The small constant \(\varepsilon\) guards against division by tiny norms.

LMS vs NLMS

Feature	LMS	NLMS
Cost	\(O(L)\)	\(O(L)\)
Step bound	\(\mu < 2/(L P_x)\) (input-dependent)	\(\tilde{\mu} < 2\) (input-free)
Robust to input swing	Weak	Strong
Implementation effort	Minimal	LMS plus a normalization term

Python Implementation

System Identification Example

We identify an unknown FIR system \(\mathbf{w}^\ast\) driven by white noise, using both LMS and NLMS, and plot the learning curves (squared error over iterations).

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

# --- Unknown system to identify ---
L = 16
w_true = np.array([0.5, -0.3, 0.2, 0.1, -0.05, 0.02,
                   0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                   0.0, 0.0, 0.0, 0.0])

# --- Input and observation ---
N = 5000
x = np.random.randn(N)
v = 0.05 * np.random.randn(N)        # measurement noise
d = np.convolve(x, w_true, mode='full')[:N] + v


def lms(x, d, L, mu):
    """Standard LMS algorithm."""
    N = len(x)
    w = np.zeros(L)
    e = np.zeros(N)
    for n in range(L, N):
        x_vec = x[n:n - L:-1]
        y = w @ x_vec
        e[n] = d[n] - y
        w = w + mu * e[n] * x_vec
    return w, e


def nlms(x, d, L, mu_tilde, eps=1e-6):
    """Normalized LMS algorithm."""
    N = len(x)
    w = np.zeros(L)
    e = np.zeros(N)
    for n in range(L, N):
        x_vec = x[n:n - L:-1]
        y = w @ x_vec
        e[n] = d[n] - y
        norm = x_vec @ x_vec + eps
        w = w + (mu_tilde / norm) * e[n] * x_vec
    return w, e


# --- Run ---
mu_lms = 0.01
mu_nlms = 0.5
w_lms, e_lms = lms(x, d, L, mu_lms)
w_nlms, e_nlms = nlms(x, d, L, mu_nlms)


# --- Learning curve (smoothed squared error) ---
def smooth(v, win=50):
    kernel = np.ones(win) / win
    return np.convolve(v ** 2, kernel, mode='same')


plt.figure(figsize=(10, 5))
plt.semilogy(smooth(e_lms), label=f'LMS (μ={mu_lms})')
plt.semilogy(smooth(e_nlms), label=f'NLMS (μ̃={mu_nlms})')
plt.xlabel('Iteration n')
plt.ylabel('Mean Squared Error')
plt.title('Learning Curves: LMS vs NLMS')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"LMS   final coeff error: {np.linalg.norm(w_lms - w_true):.4f}")
print(f"NLMS  final coeff error: {np.linalg.norm(w_nlms - w_true):.4f}")

The log-scale curves show an exponential decay phase followed by a flat steady-state misadjustment floor. Increasing \(\mu\) (or \(\tilde{\mu}\) ) speeds up the initial decay but raises the floor — the canonical convergence-vs-misadjustment trade-off.

Step-Size Sensitivity

plt.figure(figsize=(10, 5))
for mu in [0.005, 0.02, 0.05, 0.1]:
    _, e = lms(x, d, L, mu)
    plt.semilogy(smooth(e), label=f'μ={mu}', alpha=0.8)
plt.xlabel('Iteration n')
plt.ylabel('Mean Squared Error')
plt.title('LMS: Sensitivity to Step Size μ')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Tiny \(\mu\) converges painfully slowly; values near the upper bound oscillate from the start. As a rule of thumb, choose at most half of the bound in (8).

Connection to the Wiener Filter

LMS converges in mean to the Wiener solution \(\mathbf{w}^\ast = \mathbf{R}^{-1}\mathbf{p}\) , with \(\mathbf{p} = E[d(n)\mathbf{x}(n)]\) :

\[ E\bigl[\mathbf{w}(n)\bigr] \xrightarrow{n \to \infty} \mathbf{w}^\ast \tag{11} \]

Because LMS uses an instantaneous gradient, a finite excess MSE remains. RLS (Recursive Least Squares) accelerates convergence at higher cost.

When to Use What

Use case	Choice	Reason
Stationary input level	LMS	Smallest implementation, lowest cost
Speech / time-varying signals	NLMS	Robust to input-power swings
Fast convergence is critical	RLS	Quadratic cost \(O(L^2)\) but quick
Embedded / low-resource targets	LMS / NLMS	\(O(L)\) is sufficient

Wiener Filter: Optimal Linear Filtering Theory and Python Implementation - The optimum LMS approximates iteratively. Compares batch (frequency-domain) vs adaptive (time-domain) approaches.
Adaptive Filters (LMS/RLS): Theory and Python Implementation - Broader treatment including RLS and noise-cancellation applications.
Autocorrelation Function: Theory and Python Implementation - The input autocorrelation matrix \(\mathbf{R}\) governs LMS convergence speed.
Fast Fourier Transform (FFT): Theory and Python Implementation - Foundation for frequency-domain adaptive filters (FDAF).
Kalman Filter: Theory and Python Implementation - A different recursive estimator; RLS is mathematically equivalent to a Kalman filter.
Moving Average Filters Compared: SMA, WMA, and EMA in Python - Contrast with fixed-coefficient filters.
FIR and IIR Filter Comparison - Most LMS/NLMS implementations use an FIR structure; understanding FIR vs IIR clarifies step-size and convergence trade-offs.

References

Haykin, S. (2014). Adaptive Filter Theory (5th ed.). Pearson. Chapters 5-6.
Widrow, B., & Stearns, S. D. (1985). Adaptive Signal Processing. Prentice Hall.
Sayed, A. H. (2008). Adaptive Filters. Wiley-IEEE Press.