Introduction to Wavelet Transform: Time-Frequency Analysis with Python

Introduction

As discussed in the FFT article, the Fourier transform is a powerful tool for decomposing signals into sinusoidal frequency components. However, it has an inherent limitation: loss of time information.

The Fourier transform tells us “which frequencies are present in the signal as a whole” but not “when those frequencies occurred.” For signals whose frequency content changes over time, such as speech or seismic waves, this limitation is critical.

The Short-Time Fourier Transform (STFT) introduces time localization using a window function, but its fixed window size cannot simultaneously achieve high resolution for both high and low frequencies. The Wavelet Transform solves this problem by providing adaptive resolution in both time and frequency.

This article covers the mathematical foundations of the Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT), along with Python implementations using PyWavelets.

From Fourier to Wavelet Transform

Limitations of STFT: Fixed Window Size

The STFT segments a signal with a short window and applies the Fourier transform to each segment:

\[\text{STFT}\{x(t)\}(\tau, \omega) = \int_{-\infty}^{\infty} x(t) \, w(t - \tau) \, e^{-j\omega t} \, dt \tag{1}\]

where \(w(t)\) is the window function and \(\tau\) is the time shift. The fundamental issue with STFT is that the window size is fixed:

Short window: Good time resolution, poor frequency resolution
Long window: Good frequency resolution, poor time resolution

This trade-off is governed by the Heisenberg uncertainty principle. There is a lower bound on the product of time spread \(\Delta t\) and frequency spread \(\Delta f\):

\[\Delta t \cdot \Delta f \geq \frac{1}{4\pi} \tag{2}\]

In STFT, the product \(\Delta t \cdot \Delta f\) is fixed at a constant value. Wavelet transforms maintain this constant product but adaptively redistribute \(\Delta t\) and \(\Delta f\) depending on frequency.

The Wavelet Transform Idea

The key insight of wavelet analysis is to use localized waveforms (wavelets) as basis functions instead of sinusoids. Wavelets have finite temporal extent, naturally providing time localization.

By dilating and compressing the wavelet, different frequency bands can be analyzed at different resolutions:

Small scale (compressed wavelet): Captures high-frequency components with high time resolution
Large scale (stretched wavelet): Captures low-frequency components with high frequency resolution

Continuous Wavelet Transform (CWT)

Mother Wavelet

The basis function for the wavelet transform is called the mother wavelet \(\psi(t)\). It must satisfy two conditions:

Finite energy: \(\int_{-\infty}^{\infty} |\psi(t)|^2 \, dt < \infty\)
Admissibility condition: \(C_\psi = \int_0^{\infty} \frac{|\hat{\Psi}(\omega)|^2}{\omega} \, d\omega < \infty\)

The admissibility condition implies that the mother wavelet has zero mean (\(\int \psi(t) \, dt = 0\)), meaning it must be an oscillatory waveform.

CWT Definition

The CWT of a signal \(x(t)\) is defined using the scale parameter \(a > 0\) and translation parameter \(b\):

\[W(a, b) = \frac{1}{\sqrt{a}} \int_{-\infty}^{\infty} x(t) \, \psi^*\!\left(\frac{t - b}{a}\right) dt \tag{3}\]

where:

\(a\): Scale parameter (dilation). Larger \(a\) stretches the wavelet, capturing lower frequencies
\(b\): Translation parameter. Specifies the position along the signal
\(\psi^*\): Complex conjugate of the mother wavelet
\(\frac{1}{\sqrt{a}}\): Energy normalization factor, ensuring constant wavelet energy across scales

The CWT result \(W(a, b)\) is a two-dimensional time-scale representation, visualized as a scalogram (\(|W(a, b)|^2\)).

Common Mother Wavelets

Morlet Wavelet

\[\psi(t) = \pi^{-1/4} e^{j\omega_0 t} e^{-t^2/2} \tag{4}\]

A complex sinusoid modulated by a Gaussian envelope, where \(\omega_0\) (typically \(5\) or \(6\)) controls the center frequency. It is one of the most widely used wavelets for time-frequency analysis, offering a good balance between time and frequency resolution.

Mexican Hat Wavelet

\[\psi(t) = \frac{2}{\sqrt{3}\pi^{1/4}} (1 - t^2) e^{-t^2/2} \tag{5}\]

The negative normalized second derivative of a Gaussian, also known as the Difference of Gaussians (DoG). It is a real-valued wavelet well-suited for detecting singularities and edges in signals.

Discrete Wavelet Transform (DWT)

Dyadic Sampling

The CWT varies scale and translation continuously, producing highly redundant information. For computational efficiency and theoretical elegance, the Discrete Wavelet Transform (DWT) discretizes scale and translation using powers of two:

\[a = 2^j, \quad b = k \cdot 2^j \tag{6}\]

where \(j\) is the decomposition level and \(k\) is the translation index. The wavelet basis functions become:

\[\psi_{j,k}(t) = 2^{-j/2} \psi(2^{-j} t - k) \tag{7}\]

Multiresolution Analysis (MRA)

The theoretical foundation of the DWT is Multiresolution Analysis (MRA), which decomposes a signal into approximation and detail components at different resolution levels.

MRA is built upon two functions:

Scaling function \(\phi(t)\): Represents low-frequency content (approximation)
Wavelet function \(\psi(t)\): Represents high-frequency content (detail)

The scaling function satisfies the two-scale equation:

\[\phi(t) = \sqrt{2} \sum_k h[k] \, \phi(2t - k) \tag{8}\]

The wavelet function is derived from the scaling function:

\[\psi(t) = \sqrt{2} \sum_k g[k] \, \phi(2t - k) \tag{9}\]

where \(h[k]\) is the low-pass filter coefficients, \(g[k]\) is the high-pass filter coefficients, and they are related by:

\[g[k] = (-1)^k h[N - 1 - k] \tag{10}\]

Filter Bank Decomposition

DWT computation is efficiently implemented as a cascade of filter banks and downsampling operations, known as Mallat’s algorithm.

Each decomposition stage passes the signal through a low-pass filter \(H\) and a high-pass filter \(G\), followed by downsampling by 2:

Approximation Coefficients (cA): Low-pass filter \(H\) + downsampling by 2 -> low-frequency components
Detail Coefficients (cD): High-pass filter \(G\) + downsampling by 2 -> high-frequency components

In multi-level decomposition, the approximation coefficients cA are recursively decomposed further. At level \(j\), the signal is separated into:

\[x(t) = \text{cA}_j + \text{cD}_j + \text{cD}_{j-1} + \cdots + \text{cD}_1 \tag{11}\]

This structure can be understood as a cascaded version of the filter concepts discussed in Low-Pass Filter Design and Comparison.

Python Implementation with PyWavelets

CWT Example: Scalogram of a Chirp Signal

We apply CWT to a chirp signal whose frequency changes over time and visualize the resulting scalogram.

import numpy as np
import matplotlib.pyplot as plt
import pywt

# --- Generate chirp signal ---
fs = 1000  # Sampling frequency [Hz]
t = np.arange(0, 2, 1 / fs)
# Chirp: frequency linearly increases from 10 Hz to 100 Hz
freq = np.linspace(10, 100, len(t))
signal = np.sin(2 * np.pi * np.cumsum(freq) / fs)

# --- Compute CWT ---
scales = np.arange(1, 128)
coefficients, frequencies = pywt.cwt(signal, scales, "morl", 1 / fs)

# --- Plot scalogram ---
plt.figure(figsize=(10, 5))
plt.pcolormesh(
    t, frequencies, np.abs(coefficients) ** 2, shading="gouraud", cmap="viridis"
)
plt.colorbar(label="Power")
plt.xlabel("Time [s]")
plt.ylabel("Frequency [Hz]")
plt.ylim(0, 150)
plt.title("CWT Scalogram (Morlet Wavelet)")
plt.tight_layout()
plt.show()

The scalogram clearly shows the frequency increasing from 10 Hz to 100 Hz over time. Compared to an STFT spectrogram, the wavelet scalogram offers higher frequency resolution at low frequencies and higher time resolution at high frequencies, demonstrating the adaptive resolution property.

DWT Example: Multi-Level Decomposition

We perform multi-level DWT decomposition on a noisy signal and visualize the approximation and detail coefficients at each level.

import numpy as np
import matplotlib.pyplot as plt
import pywt

# --- Generate test signal ---
np.random.seed(42)
fs = 500
t = np.arange(0, 1, 1 / fs)
# Low-frequency + high-frequency + noise
clean = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 50 * t)
noise = 0.3 * np.random.randn(len(t))
signal = clean + noise

# --- Multi-level DWT decomposition ---
wavelet = "db4"  # Daubechies-4 wavelet
level = 5
coeffs = pywt.wavedec(signal, wavelet, level=level)
# coeffs = [cA5, cD5, cD4, cD3, cD2, cD1]

# --- Visualization ---
fig, axes = plt.subplots(level + 2, 1, figsize=(10, 10), sharex=False)

axes[0].plot(t, signal, "k", linewidth=0.5)
axes[0].set_title("Original Signal")
axes[0].set_ylabel("Amplitude")

axes[1].plot(coeffs[0], "b", linewidth=0.5)
axes[1].set_title(f"Approximation Coefficients (cA{level})")
axes[1].set_ylabel("Amplitude")

for i in range(1, level + 1):
    axes[i + 1].plot(coeffs[i], "r", linewidth=0.5)
    axes[i + 1].set_title(f"Detail Coefficients (cD{level - i + 1})")
    axes[i + 1].set_ylabel("Amplitude")

plt.tight_layout()
plt.show()

Examining the decomposition results reveals:

cA5 (approximation coefficients): Contains the lowest-frequency components (the 5 Hz sine wave)
cD1, cD2 (detail coefficients, high levels): Primarily contain high-frequency noise
cD3, cD4 (detail coefficients, mid levels): Contain the 50 Hz sine wave component

Signal Denoising with Wavelet Thresholding

A key application of DWT is wavelet thresholding for signal denoising. Since noise primarily appears in the detail coefficients, zeroing out coefficients below a threshold effectively removes noise.

import numpy as np
import matplotlib.pyplot as plt
import pywt

# --- Generate test signal ---
np.random.seed(42)
fs = 500
t = np.arange(0, 1, 1 / fs)
clean = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 50 * t)
noise = 0.3 * np.random.randn(len(t))
signal = clean + noise

# --- DWT decomposition ---
wavelet = "db4"
level = 5
coeffs = pywt.wavedec(signal, wavelet, level=level)

# --- Thresholding ---
# Universal threshold (VisuShrink): lambda = sigma * sqrt(2 * log(N))
sigma = np.median(np.abs(coeffs[-1])) / 0.6745  # Robust noise estimation
threshold = sigma * np.sqrt(2 * np.log(len(signal)))

# Apply soft thresholding to detail coefficients
coeffs_thresh = [coeffs[0]]  # Keep approximation coefficients
for c in coeffs[1:]:
    coeffs_thresh.append(pywt.threshold(c, threshold, mode="soft"))

# --- Inverse DWT reconstruction ---
denoised = pywt.waverec(coeffs_thresh, wavelet)

# --- Visualization ---
fig, axes = plt.subplots(3, 1, figsize=(10, 7), sharex=True)

axes[0].plot(t, clean, "b", linewidth=0.8)
axes[0].set_title("Original Clean Signal")
axes[0].set_ylabel("Amplitude")

axes[1].plot(t, signal, "k", linewidth=0.5)
axes[1].set_title(f"Noisy Signal (SNR = {10*np.log10(np.var(clean)/np.var(noise)):.1f} dB)")
axes[1].set_ylabel("Amplitude")

axes[2].plot(t, denoised[: len(t)], "r", linewidth=0.8)
axes[2].set_title("Denoised Signal (Wavelet Thresholding)")
axes[2].set_ylabel("Amplitude")

for ax in axes:
    ax.grid(True, alpha=0.3)
axes[2].set_xlabel("Time [s]")

plt.tight_layout()
plt.show()

# --- Quantitative evaluation ---
mse_noisy = np.mean((signal - clean) ** 2)
mse_denoised = np.mean((denoised[: len(t)] - clean) ** 2)
print(f"MSE of noisy signal:    {mse_noisy:.6f}")
print(f"MSE after denoising:    {mse_denoised:.6f}")
print(f"MSE improvement:        {(1 - mse_denoised / mse_noisy) * 100:.1f}%")

Soft thresholding shrinks coefficients exceeding the threshold \(\lambda\) by \(\lambda\) and zeros out those below:

\[\text{soft}(x, \lambda) = \text{sign}(x) \cdot \max(|x| - \lambda, 0) \tag{12}\]

The universal threshold \(\lambda = \sigma\sqrt{2 \log N}\) (VisuShrink) is automatically determined from the noise standard deviation \(\sigma\) and signal length \(N\), requiring no parameter tuning. The noise standard deviation \(\sigma\) is robustly estimated using the MAD (Median Absolute Deviation).

Comparison: FFT vs STFT vs Wavelet

Feature	FFT	STFT	Wavelet
Time resolution	None	Fixed	Adaptive (high at high-freq)
Frequency resolution	High (global)	Fixed	Adaptive (high at low-freq)
Basis function	Sinusoid	Windowed sinusoid	Mother wavelet
Output	1D spectrum	2D (time-frequency)	2D (time-scale)
Inverse transform	Yes	Yes	Yes
Complexity	\(O(N \log N)\)	\(O(N L \log L)\)	CWT: \(O(NS)\), DWT: \(O(N)\)
Primary use	Stationary signals	Speech/music analysis	Non-stationary, denoising

Here \(L\) is the STFT window length and \(S\) is the number of CWT scales. DWT achieves \(O(N)\) complexity through its filter bank implementation, making it highly efficient.

Guidelines for choosing the appropriate method:

Stationary signals where you need the overall frequency content -> FFT
Time-varying frequencies tracked at uniform resolution -> STFT
Simultaneous detection of slow trends and rapid transients -> Wavelet Transform

Summary

This article covered the mathematical foundations and Python implementations of the wavelet transform.

CWT varies scale and translation continuously, achieving adaptive time-frequency resolution
DWT achieves efficient \(O(N)\) decomposition through dyadic sampling and filter banks
Wavelet thresholding is a practical denoising technique that applies thresholds to detail coefficients
FFT, STFT, and wavelet transforms each excel in different scenarios; choosing the right tool depends on the analysis target

The wavelet transform extends beyond signal processing into image compression (JPEG 2000), financial data analysis, biomedical signal processing, seismic analysis, and many other fields.

Fast Fourier Transform (FFT): Theory and Python Implementation - Detailed coverage of the Fourier transform, providing the foundation for comparison with wavelets.
Window Functions and Power Spectral Density (PSD): Theory and Python Implementation - FFT window functions and PSD estimation in detail.
Low-Pass Filter Design and Comparison - Enables comparison between classical filter design and the wavelet approach.
Time-Series Anomaly Detection: From Statistical Methods to Kalman Filter in Python - Anomaly detection methods where wavelet transform can serve as preprocessing.
Wiener Filter: Optimal Linear Filtering Theory and Python Implementation - A different approach to optimal noise removal.
Frequency Characteristics of the Exponential Moving Average (EMA) Filter - Helps understand the difference between time-domain filtering and wavelet-based frequency separation.
Moving Average Filters Compared: SMA, WMA, and EMA in Python - Helps understand the relationship between time-domain filters and wavelet filter banks.
Matplotlib Practical Tips: Creating Publication-Quality Figures - Tips for producing higher-quality scalogram visualizations.

References

Mallat, S. (2008). A Wavelet Tour of Signal Processing: The Sparse Way (3rd ed.). Academic Press.
Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM.
PyWavelets documentation
SciPy Signal Processing documentation