Convolution and Correlation Fundamentals: Theory, Convolution Theorem, and Python Implementation

Convolution and correlation: discrete definitions, the convolution theorem, FFT-based fast convolution, and the moving-average link, with NumPy/SciPy Python.

Introduction

Convolution and correlation are foundational operations in signal processing. They underpin filtering, feature extraction, pattern matching, and system identification, and they form the mathematical bedrock of frequency-domain methods such as the FFT and the wavelet transform.

This article reviews the continuous and discrete definitions of convolution and correlation, clarifies how the two are related, and uses the convolution theorem to bridge the time and frequency domains. We then implement and benchmark np.convolve, scipy.signal.correlate, and FFT-based fast convolution in Python. Finally, we show that the moving average filter is simply a convolution with a box window.

Definition of Convolution

Continuous Convolution

For two continuous functions \(f(t)\) and \(g(t)\) , convolution is defined as

\[(f * g)(t) = \int_{-\infty}^{\infty} f(\tau) \cdot g(t - \tau) \, d\tau \tag{1}\]

Intuitively, we flip \(g\) in time, slide it by \(t\) , multiply with \(f\) , and integrate. Because the response of a linear time-invariant (LTI) system is the convolution of its input with its impulse response, filtering is convolution in the most fundamental sense.

Discrete Convolution

For discrete signals \(x[n]\) and \(h[n]\) ,

\[y[n] = (x * h)[n] = \sum_{k=-\infty}^{\infty} x[k] \cdot h[n - k] \tag{2}\]

In practice, we time-reverse \(h\) , slide it across \(x\) , take elementwise products, and sum. If \(x\) has length \(N\) and \(h\) has length \(M\) , the linear convolution output has length \(N + M - 1\) .

Algebraic Properties

Convolution satisfies several useful identities.

  • Commutativity: \(x * h = h * x\)
  • Associativity: \((x * h) * g = x * (h * g)\)
  • Distributivity: \(x * (h + g) = x * h + x * g\)

Cascaded filters can therefore be merged into a single equivalent kernel.

Definition of Correlation

Cross-Correlation

The cross-correlation of \(x[n]\) and \(y[n]\) at lag \(\ell\) is

\[R_{xy}[\ell] = \sum_{n=-\infty}^{\infty} x[n] \cdot y[n + \ell] \tag{3}\]

It measures how similar \(x\) is to a shifted version of \(y\) , and is widely used for delay estimation and template matching.

Auto-Correlation

When \(y = x\) , we obtain the auto-correlation, which detects periodicities and self-similarity in a single signal:

\[R_{xx}[\ell] = \sum_{n=-\infty}^{\infty} x[n] \cdot x[n + \ell] \tag{4}\]

\(R_{xx}[0]\) equals the total energy (or variance) and is the global maximum of \(R_{xx}\) .

Relationship to Convolution

Cross-correlation and convolution differ only in whether the second argument is time-reversed. For real signals,

\[R_{xy}[\ell] = (x * \tilde{y})[\ell], \quad \tilde{y}[n] = y[-n] \tag{5}\]

so correlation can be implemented as convolution with a time-reversed kernel. For complex signals, also take the complex conjugate (\(\tilde{y}[n] = y^*[-n]\) ).

The Convolution Theorem

Time-Domain Convolution ↔ Frequency-Domain Multiplication

The most consequential property of convolution is that it becomes pointwise multiplication in the frequency domain. With the discrete Fourier transform (DFT) denoted \(\mathcal{F}\) ,

\[\mathcal{F}\{x * h\}[k] = X[k] \cdot H[k] \tag{6}\]

Conversely, multiplication in time corresponds to convolution in frequency — the very mechanism behind spectral leakage discussed in the windowing/PSD article:

\[\mathcal{F}\{x \cdot h\}[k] = \frac{1}{N}(X * H)[k] \tag{7}\]

Fast Convolution Complexity

Direct convolution of two length-\(N\) signals costs \(O(N^2)\) . Combined with the \(O(N \log N)\) FFT, the convolution theorem gives

\[y = \mathcal{F}^{-1}\{ \mathcal{F}\{x\} \cdot \mathcal{F}\{h\} \} \tag{8}\]

at total cost \(O(N \log N)\) . For long kernels this is many orders of magnitude faster. To produce linear convolution (rather than circular), zero-pad both inputs to length at least \(N + M - 1\) .

Correlation Theorem

A similar identity holds for correlation:

\[\mathcal{F}\{R_{xy}\}[k] = X^*[k] \cdot Y[k] \tag{9}\]

In particular, the Fourier transform of the auto-correlation equals the power spectral density (Wiener–Khinchin theorem), which is the theoretical foundation of PSD estimation.

Python Implementation

np.convolve and scipy.signal.correlate

We start by validating the basic APIs.

import numpy as np
from scipy.signal import correlate

x = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
h = np.array([1.0, 0.5, 0.25])

# Linear convolution (mode='full' yields length N+M-1)
y_conv = np.convolve(x, h, mode='full')
print('convolve:', y_conv)

# Cross-correlation (mode='full')
y_corr = correlate(x, h, mode='full')
print('correlate:', y_corr)

# Verify equation (5): correlation = convolution with flipped h
y_corr_via_conv = np.convolve(x, h[::-1], mode='full')
print('correlate via flipped conv:', y_corr_via_conv)

correlate(x, h) and convolve(x, h[::-1]) agree, confirming equation \((5)\) .

FFT-Based Fast Convolution

We benchmark direct convolution against FFT-based convolution for a long kernel.

import numpy as np
import time
from scipy.signal import fftconvolve

np.random.seed(0)
N = 1 << 16   # 65536
M = 1 << 12   # 4096
x = np.random.randn(N)
h = np.random.randn(M)

# Direct convolution
t0 = time.perf_counter()
y_direct = np.convolve(x, h, mode='full')
t_direct = time.perf_counter() - t0

# SciPy FFT convolution
t0 = time.perf_counter()
y_fft = fftconvolve(x, h, mode='full')
t_fft = time.perf_counter() - t0

# Manual FFT convolution implementing equation (8)
L = N + M - 1
n_fft = 1 << int(np.ceil(np.log2(L)))
t0 = time.perf_counter()
X = np.fft.rfft(x, n=n_fft)
H = np.fft.rfft(h, n=n_fft)
y_manual = np.fft.irfft(X * H, n=n_fft)[:L]
t_manual = time.perf_counter() - t0

print(f'direct  : {t_direct*1e3:7.2f} ms')
print(f'fftconv : {t_fft*1e3:7.2f} ms')
print(f'manual  : {t_manual*1e3:7.2f} ms')
print(f'max err : {np.max(np.abs(y_direct - y_fft)):.2e}')

For \(N = 65536\) and \(M = 4096\) , np.convolve typically takes hundreds of milliseconds, while fftconvolve finishes in roughly a dozen milliseconds. The maximum error matches floating-point round-off (\(\sim 10^{-10}\) ).

Connection to the Moving Average

A simple moving average (SMA) of length \(L\) is a convolution with the box window \(h_{\text{box}}[n]\) whose coefficients are all \(1/L\) :

\[h_{\text{box}}[n] = \frac{1}{L}, \quad 0 \leq n \leq L - 1 \tag{10}\] \[y[n] = \frac{1}{L}\sum_{k=0}^{L-1} x[n - k] = (x * h_{\text{box}})[n] \tag{11}\]
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1)
fs = 1000
t = np.arange(0, 1, 1/fs)
signal = np.sin(2*np.pi*5*t) + 0.4*np.random.randn(len(t))

L = 21
h_box = np.ones(L) / L

# Moving average = convolution with the box window
sma = np.convolve(signal, h_box, mode='same')

# Cross-check with scipy.ndimage.uniform_filter1d
from scipy.ndimage import uniform_filter1d
sma_ref = uniform_filter1d(signal, size=L, mode='nearest')

plt.figure(figsize=(10, 4))
plt.plot(t, signal, alpha=0.4, label='Noisy signal')
plt.plot(t, sma, label=f'Convolution with box (L={L})')
plt.plot(t, sma_ref, '--', label='uniform_filter1d (reference)')
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Up to boundary handling, both outputs coincide, confirming that the moving average is a special case of convolution. Replacing the kernel by a triangular or Hann window naturally extends the idea to more general FIR filters.

Summary

  • Convolution flips one signal, slides it, and accumulates the elementwise product; it fully describes any LTI system response.
  • Correlation is similar but without the time reversal: cross-correlation \(R_{xy}\) enables template matching, while auto-correlation \(R_{xx}\) uncovers periodic structure.
  • The convolution theorem (equation \((6)\) ) turns time-domain convolution into frequency-domain multiplication, enabling \(O(N \log N)\) fast convolution via the FFT.
  • Moving average filters are convolutions with a box window — the simplest FIR filter.
  • Wiener–Khinchin shows that auto-correlation and PSD are two views of the same quantity.

Mastering these primitives makes filter design, spectral analysis, and time-frequency analysis feel like one coherent mathematical framework.

References

  • Oppenheim, A. V., & Schafer, R. W. (2009). Discrete-Time Signal Processing (3rd ed.). Prentice Hall.
  • Smith, S. W. (1997). The Scientist and Engineer’s Guide to Digital Signal Processing. California Technical Publishing.
  • NumPy convolve documentation
  • SciPy signal documentation