Median Filter Theory and Python Implementation: Non-Linear Filter for Impulse Noise Removal

A complete guide to the median filter's mathematical principles, comparison with linear filters (EMA, moving average), impulse noise removal, and Python implementation using scipy.signal.medfilt.

What Is a Median Filter?

A median filter is a non-linear filter that sorts the values within a sliding window and outputs the median (middle value) of those sorted samples.

Unlike linear filters such as the EMA filter or moving average filter, the median filter is exceptionally effective at removing impulse noise (spike-like noise) while preserving signal edges.

Common Applications

  • Impulse noise removal: Sensor glitches, bit-flip errors in digital transmission
  • Image processing: Salt-and-pepper noise removal
  • Medical signal processing: ECG artifact removal
  • Financial data: Filtering of erroneous tick data in price series

Mathematical Definition

For a window of size \(2m+1\) (where \(m\) is the one-sided half-width), the output \(y[n]\) at time \(n\) is:

\[ y[n] = \text{median}\bigl(x[n-m],\, x[n-m+1],\, \dots,\, x[n],\, \dots,\, x[n+m]\bigr) \tag{1} \]

The \(2m+1\) values are sorted, and the \((m+1)\)-th value (the middle one) is selected as the output.

Example (window size \(= 5\), \(m = 2\)):

Suppose a portion of the input is \([\ldots, 3, 4, \mathbf{100}, 4, 3, \ldots]\) where the bold value is a spike:

\[ \text{sort}(3, 4, 100, 4, 3) = (3, 3, \mathbf{4}, 4, 100) \Rightarrow y[n] = 4 \tag{2} \]

The spike value \(100\) is completely eliminated, and the true signal value \(4\) is recovered.

Key Differences from Linear Filters

The Superposition Principle Does Not Hold

Linear filters compute outputs as weighted sums of inputs. Median filtering involves sorting, so the superposition principle fails:

\[ \text{median}(ax_1 + bx_2) \neq a\cdot\text{median}(x_1) + b\cdot\text{median}(x_2) \tag{3} \]

This non-linearity is precisely what gives the median filter its unique advantage over linear filters for impulse noise.

Impulse Noise Robustness

Linear filters spread impulse noise across neighboring samples. The median filter confines the spike’s influence within the window and eliminates it through sorting.

FilterGaussian Noise RemovalImpulse Noise RemovalEdge Preservation
Moving AverageGoodPoor (spreads)Low
EMAGood (weighted)Poor (spreads)Low
Savitzky-GolayGoodPoorHigh
MedianModerateExcellentHigh

Python Implementation

Manual Implementation

import numpy as np

def median_filter_manual(x: np.ndarray, window_size: int) -> np.ndarray:
    """Median filter with reflect padding at edges."""
    assert window_size % 2 == 1, "window_size must be odd"
    m = window_size // 2
    n = len(x)
    y = np.empty(n)
    x_padded = np.pad(x, m, mode="reflect")
    for i in range(n):
        window = x_padded[i : i + window_size]
        y[i] = np.median(window)
    return y

Using scipy.signal.medfilt

In practice, scipy.signal.medfilt is the preferred choice:

import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

# --- Signal generation ---
np.random.seed(42)
t = np.linspace(0, 1, 500)
x_clean = np.sin(2 * np.pi * 5 * t)
noise_gaussian = 0.2 * np.random.randn(len(t))
x_noisy = x_clean + noise_gaussian

# Add impulse noise
rng = np.random.default_rng(0)
impulse_indices = rng.choice(len(t), size=30, replace=False)
x_impulse = x_noisy.copy()
x_impulse[impulse_indices] += rng.choice([-4.0, 4.0], size=30)

# --- Apply filters ---
window_size = 11

y_median = signal.medfilt(x_impulse, kernel_size=window_size)
y_ma = np.convolve(x_impulse, np.ones(window_size) / window_size, mode="same")

# --- Plot ---
fig, axes = plt.subplots(3, 1, figsize=(12, 9), sharex=True)

axes[0].plot(t, x_impulse, color="gray", alpha=0.7, label="Noisy signal (with impulses)")
axes[0].plot(t, x_clean, color="blue", linestyle="--", label="Clean signal", linewidth=2)
axes[0].set_ylabel("Amplitude")
axes[0].legend()
axes[0].set_title("Input Signal (Impulse Noise Added)")

axes[1].plot(t, y_median, color="red", label=f"Median filter (window={window_size})")
axes[1].plot(t, x_clean, color="blue", linestyle="--", label="Clean signal", linewidth=2)
axes[1].set_ylabel("Amplitude")
axes[1].legend()
axes[1].set_title("Median Filter Output")

axes[2].plot(t, y_ma, color="green", label=f"Moving average (window={window_size})")
axes[2].plot(t, x_clean, color="blue", linestyle="--", label="Clean signal", linewidth=2)
axes[2].set_ylabel("Amplitude")
axes[2].set_xlabel("Time [s]")
axes[2].legend()
axes[2].set_title("Moving Average Output (Comparison)")

plt.tight_layout()
plt.show()

Window Size Selection

The window size \(W = 2m+1\) is the only parameter of the median filter.

import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

np.random.seed(0)
t = np.linspace(0, 1, 300)
x_clean = np.sin(2 * np.pi * 3 * t)
x_noisy = x_clean.copy()
idx = np.random.choice(len(t), size=30, replace=False)
x_noisy[idx] += np.random.choice([-3, 3], size=30)

window_sizes = [3, 7, 15, 31]
fig, axes = plt.subplots(len(window_sizes), 1, figsize=(12, 10), sharex=True)

for ax, ws in zip(axes, window_sizes):
    y = signal.medfilt(x_noisy, kernel_size=ws)
    mse = np.mean((y - x_clean) ** 2)
    ax.plot(t, x_noisy, color="gray", alpha=0.5, label="Noisy input")
    ax.plot(t, y, color="red", label=f"window={ws}, MSE={mse:.4f}")
    ax.plot(t, x_clean, color="blue", linestyle="--", label="Clean signal", linewidth=1.5)
    ax.legend(loc="upper right")
    ax.set_ylabel("Amplitude")

axes[-1].set_xlabel("Time [s]")
plt.suptitle("Median Filter with Different Window Sizes")
plt.tight_layout()
plt.show()

Selection guidelines:

  • Small window (3–5): Removes fine spikes while preserving signal detail. Suitable for sparse noise.
  • Large window (11–31): Stronger denoising but may blunt sharp peaks and edges.
  • Consecutive noise: The window size must exceed the length of any consecutive noise run.

Combining with EMA for Mixed Noise

The median filter excels at impulse noise but is only moderate for Gaussian noise. A practical two-stage pipeline combines both:

from scipy import signal
import numpy as np

def hybrid_filter(x: np.ndarray, median_window: int = 5, ema_alpha: float = 0.2) -> np.ndarray:
    """
    Two-stage filter: Median → EMA
    Step 1: Remove impulse noise with median filter
    Step 2: Smooth residual Gaussian noise with EMA
    """
    # Step 1: Impulse removal
    y_median = signal.medfilt(x, kernel_size=median_window)
    # Step 2: EMA smoothing
    y_ema = np.zeros_like(y_median)
    y_ema[0] = y_median[0]
    for i in range(1, len(y_median)):
        y_ema[i] = ema_alpha * y_median[i] + (1 - ema_alpha) * y_ema[i - 1]
    return y_ema

This two-stage design is widely used in sensor data pipelines and communication system denoising.

Computational Complexity and Edge Handling

Complexity

A naive implementation sorts \(W\) values per output sample, giving \(O(W \log W)\) per sample and \(O(NW \log W)\) overall. scipy.signal.medfilt uses efficient internal algorithms. For large-scale data, scipy.ndimage.median_filter is often faster and offers more flexible edge-handling options:

from scipy import ndimage

# reflect padding (generally recommended)
y_reflect = ndimage.median_filter(x, size=11, mode="reflect")
# nearest-neighbor padding
y_nearest = ndimage.median_filter(x, size=11, mode="nearest")

Edge Handling

scipy.signal.medfilt uses zero-padding by default, which can cause artifacts at signal boundaries. Use scipy.ndimage.median_filter with mode="reflect" to avoid this.

Summary

PropertyMedian Filter
Filter typeNon-linear
Impulse noise removalExcellent
Gaussian noise removalModerate
Edge preservationHigh
Parameters1 (window size \(W\))
Complexity\(O(NW \log W)\)
CausalitySemi-causal (non-causal with symmetric window)

Use the median filter when impulse noise dominates. For Gaussian-dominated noise, prefer the EMA filter or Butterworth filter. For peak-shape preservation, consider the Savitzky-Golay filter. For time-varying environments, use adaptive filters.

References