Introduction to Wavelet Transform: Time-Frequency Analysis with Python

Mathematical foundations of CWT and DWT with time-frequency analysis implementations using PyWavelets in Python.

Introduction

As discussed in the FFT article, the Fourier transform is a powerful tool for decomposing signals into sinusoidal frequency components. However, it has an inherent limitation: loss of time information.

The Fourier transform tells us “which frequencies are present in the signal as a whole” but not “when those frequencies occurred.” For signals whose frequency content changes over time, such as speech or seismic waves, this limitation is critical.

The Short-Time Fourier Transform (STFT) introduces time localization using a window function, but its fixed window size cannot simultaneously achieve high resolution for both high and low frequencies. The Wavelet Transform solves this problem by providing adaptive resolution in both time and frequency.

This article covers the mathematical foundations of the Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT), along with Python implementations using PyWavelets.

From Fourier to Wavelet Transform

Limitations of STFT: Fixed Window Size

The STFT segments a signal with a short window and applies the Fourier transform to each segment:

\[\text{STFT}\{x(t)\}(\tau, \omega) = \int_{-\infty}^{\infty} x(t) \, w(t - \tau) \, e^{-j\omega t} \, dt \tag{1}\]

where \(w(t)\) is the window function and \(\tau\) is the time shift. The fundamental issue with STFT is that the window size is fixed:

  • Short window: Good time resolution, poor frequency resolution
  • Long window: Good frequency resolution, poor time resolution

This trade-off is governed by the Heisenberg uncertainty principle. There is a lower bound on the product of time spread \(\Delta t\) and frequency spread \(\Delta f\):

\[\Delta t \cdot \Delta f \geq \frac{1}{4\pi} \tag{2}\]

In STFT, the product \(\Delta t \cdot \Delta f\) is fixed at a constant value. Wavelet transforms maintain this constant product but adaptively redistribute \(\Delta t\) and \(\Delta f\) depending on frequency.

The Wavelet Transform Idea

The key insight of wavelet analysis is to use localized waveforms (wavelets) as basis functions instead of sinusoids. Wavelets have finite temporal extent, naturally providing time localization.

By dilating and compressing the wavelet, different frequency bands can be analyzed at different resolutions:

  • Small scale (compressed wavelet): Captures high-frequency components with high time resolution
  • Large scale (stretched wavelet): Captures low-frequency components with high frequency resolution

Continuous Wavelet Transform (CWT)

Mother Wavelet

The basis function for the wavelet transform is called the mother wavelet \(\psi(t)\). It must satisfy two conditions:

  1. Finite energy: \(\int_{-\infty}^{\infty} |\psi(t)|^2 \, dt < \infty\)
  2. Admissibility condition: \(C_\psi = \int_0^{\infty} \frac{|\hat{\Psi}(\omega)|^2}{\omega} \, d\omega < \infty\)

The admissibility condition implies that the mother wavelet has zero mean (\(\int \psi(t) \, dt = 0\)), meaning it must be an oscillatory waveform.

CWT Definition

The CWT of a signal \(x(t)\) is defined using the scale parameter \(a > 0\) and translation parameter \(b\):

\[W(a, b) = \frac{1}{\sqrt{a}} \int_{-\infty}^{\infty} x(t) \, \psi^*\!\left(\frac{t - b}{a}\right) dt \tag{3}\]

where:

  • \(a\): Scale parameter (dilation). Larger \(a\) stretches the wavelet, capturing lower frequencies
  • \(b\): Translation parameter. Specifies the position along the signal
  • \(\psi^*\): Complex conjugate of the mother wavelet
  • \(\frac{1}{\sqrt{a}}\): Energy normalization factor, ensuring constant wavelet energy across scales

The CWT result \(W(a, b)\) is a two-dimensional time-scale representation, visualized as a scalogram (\(|W(a, b)|^2\)).

Common Mother Wavelets

Morlet Wavelet

\[\psi(t) = \pi^{-1/4} e^{j\omega_0 t} e^{-t^2/2} \tag{4}\]

A complex sinusoid modulated by a Gaussian envelope, where \(\omega_0\) (typically \(5\) or \(6\)) controls the center frequency. It is one of the most widely used wavelets for time-frequency analysis, offering a good balance between time and frequency resolution.

Mexican Hat Wavelet

\[\psi(t) = \frac{2}{\sqrt{3}\pi^{1/4}} (1 - t^2) e^{-t^2/2} \tag{5}\]

The negative normalized second derivative of a Gaussian, also known as the Difference of Gaussians (DoG). It is a real-valued wavelet well-suited for detecting singularities and edges in signals.

Discrete Wavelet Transform (DWT)

Dyadic Sampling

The CWT varies scale and translation continuously, producing highly redundant information. For computational efficiency and theoretical elegance, the Discrete Wavelet Transform (DWT) discretizes scale and translation using powers of two:

\[a = 2^j, \quad b = k \cdot 2^j \tag{6}\]

where \(j\) is the decomposition level and \(k\) is the translation index. The wavelet basis functions become:

\[\psi_{j,k}(t) = 2^{-j/2} \psi(2^{-j} t - k) \tag{7}\]

Multiresolution Analysis (MRA)

The theoretical foundation of the DWT is Multiresolution Analysis (MRA), which decomposes a signal into approximation and detail components at different resolution levels.

MRA is built upon two functions:

  • Scaling function \(\phi(t)\): Represents low-frequency content (approximation)
  • Wavelet function \(\psi(t)\): Represents high-frequency content (detail)

The scaling function satisfies the two-scale equation:

\[\phi(t) = \sqrt{2} \sum_k h[k] \, \phi(2t - k) \tag{8}\]

The wavelet function is derived from the scaling function:

\[\psi(t) = \sqrt{2} \sum_k g[k] \, \phi(2t - k) \tag{9}\]

where \(h[k]\) is the low-pass filter coefficients, \(g[k]\) is the high-pass filter coefficients, and they are related by:

\[g[k] = (-1)^k h[N - 1 - k] \tag{10}\]

Filter Bank Decomposition

DWT computation is efficiently implemented as a cascade of filter banks and downsampling operations, known as Mallat’s algorithm.

Each decomposition stage passes the signal through a low-pass filter \(H\) and a high-pass filter \(G\), followed by downsampling by 2:

  • Approximation Coefficients (cA): Low-pass filter \(H\) + downsampling by 2 -> low-frequency components
  • Detail Coefficients (cD): High-pass filter \(G\) + downsampling by 2 -> high-frequency components

In multi-level decomposition, the approximation coefficients cA are recursively decomposed further. At level \(j\), the signal is separated into:

\[x(t) = \text{cA}_j + \text{cD}_j + \text{cD}_{j-1} + \cdots + \text{cD}_1 \tag{11}\]

This structure can be understood as a cascaded version of the filter concepts discussed in Low-Pass Filter Design and Comparison.

Python Implementation with PyWavelets

CWT Example: Scalogram of a Chirp Signal

We apply CWT to a chirp signal whose frequency changes over time and visualize the resulting scalogram.

import numpy as np
import matplotlib.pyplot as plt
import pywt

# --- Generate chirp signal ---
fs = 1000  # Sampling frequency [Hz]
t = np.arange(0, 2, 1 / fs)
# Chirp: frequency linearly increases from 10 Hz to 100 Hz
freq = np.linspace(10, 100, len(t))
signal = np.sin(2 * np.pi * np.cumsum(freq) / fs)

# --- Compute CWT ---
scales = np.arange(1, 128)
coefficients, frequencies = pywt.cwt(signal, scales, "morl", 1 / fs)

# --- Plot scalogram ---
plt.figure(figsize=(10, 5))
plt.pcolormesh(
    t, frequencies, np.abs(coefficients) ** 2, shading="gouraud", cmap="viridis"
)
plt.colorbar(label="Power")
plt.xlabel("Time [s]")
plt.ylabel("Frequency [Hz]")
plt.ylim(0, 150)
plt.title("CWT Scalogram (Morlet Wavelet)")
plt.tight_layout()
plt.show()

The scalogram clearly shows the frequency increasing from 10 Hz to 100 Hz over time. Compared to an STFT spectrogram, the wavelet scalogram offers higher frequency resolution at low frequencies and higher time resolution at high frequencies, demonstrating the adaptive resolution property.

DWT Example: Multi-Level Decomposition

We perform multi-level DWT decomposition on a noisy signal and visualize the approximation and detail coefficients at each level.

import numpy as np
import matplotlib.pyplot as plt
import pywt

# --- Generate test signal ---
np.random.seed(42)
fs = 500
t = np.arange(0, 1, 1 / fs)
# Low-frequency + high-frequency + noise
clean = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 50 * t)
noise = 0.3 * np.random.randn(len(t))
signal = clean + noise

# --- Multi-level DWT decomposition ---
wavelet = "db4"  # Daubechies-4 wavelet
level = 5
coeffs = pywt.wavedec(signal, wavelet, level=level)
# coeffs = [cA5, cD5, cD4, cD3, cD2, cD1]

# --- Visualization ---
fig, axes = plt.subplots(level + 2, 1, figsize=(10, 10), sharex=False)

axes[0].plot(t, signal, "k", linewidth=0.5)
axes[0].set_title("Original Signal")
axes[0].set_ylabel("Amplitude")

axes[1].plot(coeffs[0], "b", linewidth=0.5)
axes[1].set_title(f"Approximation Coefficients (cA{level})")
axes[1].set_ylabel("Amplitude")

for i in range(1, level + 1):
    axes[i + 1].plot(coeffs[i], "r", linewidth=0.5)
    axes[i + 1].set_title(f"Detail Coefficients (cD{level - i + 1})")
    axes[i + 1].set_ylabel("Amplitude")

plt.tight_layout()
plt.show()

Examining the decomposition results reveals:

  • cA5 (approximation coefficients): Contains the lowest-frequency components (the 5 Hz sine wave)
  • cD1, cD2 (detail coefficients, high levels): Primarily contain high-frequency noise
  • cD3, cD4 (detail coefficients, mid levels): Contain the 50 Hz sine wave component

Signal Denoising with Wavelet Thresholding

A key application of DWT is wavelet thresholding for signal denoising. Since noise primarily appears in the detail coefficients, zeroing out coefficients below a threshold effectively removes noise.

import numpy as np
import matplotlib.pyplot as plt
import pywt

# --- Generate test signal ---
np.random.seed(42)
fs = 500
t = np.arange(0, 1, 1 / fs)
clean = np.sin(2 * np.pi * 5 * t) + 0.5 * np.sin(2 * np.pi * 50 * t)
noise = 0.3 * np.random.randn(len(t))
signal = clean + noise

# --- DWT decomposition ---
wavelet = "db4"
level = 5
coeffs = pywt.wavedec(signal, wavelet, level=level)

# --- Thresholding ---
# Universal threshold (VisuShrink): lambda = sigma * sqrt(2 * log(N))
sigma = np.median(np.abs(coeffs[-1])) / 0.6745  # Robust noise estimation
threshold = sigma * np.sqrt(2 * np.log(len(signal)))

# Apply soft thresholding to detail coefficients
coeffs_thresh = [coeffs[0]]  # Keep approximation coefficients
for c in coeffs[1:]:
    coeffs_thresh.append(pywt.threshold(c, threshold, mode="soft"))

# --- Inverse DWT reconstruction ---
denoised = pywt.waverec(coeffs_thresh, wavelet)

# --- Visualization ---
fig, axes = plt.subplots(3, 1, figsize=(10, 7), sharex=True)

axes[0].plot(t, clean, "b", linewidth=0.8)
axes[0].set_title("Original Clean Signal")
axes[0].set_ylabel("Amplitude")

axes[1].plot(t, signal, "k", linewidth=0.5)
axes[1].set_title(f"Noisy Signal (SNR = {10*np.log10(np.var(clean)/np.var(noise)):.1f} dB)")
axes[1].set_ylabel("Amplitude")

axes[2].plot(t, denoised[: len(t)], "r", linewidth=0.8)
axes[2].set_title("Denoised Signal (Wavelet Thresholding)")
axes[2].set_ylabel("Amplitude")

for ax in axes:
    ax.grid(True, alpha=0.3)
axes[2].set_xlabel("Time [s]")

plt.tight_layout()
plt.show()

# --- Quantitative evaluation ---
mse_noisy = np.mean((signal - clean) ** 2)
mse_denoised = np.mean((denoised[: len(t)] - clean) ** 2)
print(f"MSE of noisy signal:    {mse_noisy:.6f}")
print(f"MSE after denoising:    {mse_denoised:.6f}")
print(f"MSE improvement:        {(1 - mse_denoised / mse_noisy) * 100:.1f}%")

Soft thresholding shrinks coefficients exceeding the threshold \(\lambda\) by \(\lambda\) and zeros out those below:

\[\text{soft}(x, \lambda) = \text{sign}(x) \cdot \max(|x| - \lambda, 0) \tag{12}\]

The universal threshold \(\lambda = \sigma\sqrt{2 \log N}\) (VisuShrink) is automatically determined from the noise standard deviation \(\sigma\) and signal length \(N\), requiring no parameter tuning. The noise standard deviation \(\sigma\) is robustly estimated using the MAD (Median Absolute Deviation).

Comparison: FFT vs STFT vs Wavelet

FeatureFFTSTFTWavelet
Time resolutionNoneFixedAdaptive (high at high-freq)
Frequency resolutionHigh (global)FixedAdaptive (high at low-freq)
Basis functionSinusoidWindowed sinusoidMother wavelet
Output1D spectrum2D (time-frequency)2D (time-scale)
Inverse transformYesYesYes
Complexity\(O(N \log N)\)\(O(N L \log L)\)CWT: \(O(NS)\), DWT: \(O(N)\)
Primary useStationary signalsSpeech/music analysisNon-stationary, denoising

Here \(L\) is the STFT window length and \(S\) is the number of CWT scales. DWT achieves \(O(N)\) complexity through its filter bank implementation, making it highly efficient.

Guidelines for choosing the appropriate method:

  • Stationary signals where you need the overall frequency content -> FFT
  • Time-varying frequencies tracked at uniform resolution -> STFT
  • Simultaneous detection of slow trends and rapid transients -> Wavelet Transform

Summary

This article covered the mathematical foundations and Python implementations of the wavelet transform.

  • CWT varies scale and translation continuously, achieving adaptive time-frequency resolution
  • DWT achieves efficient \(O(N)\) decomposition through dyadic sampling and filter banks
  • Wavelet thresholding is a practical denoising technique that applies thresholds to detail coefficients
  • FFT, STFT, and wavelet transforms each excel in different scenarios; choosing the right tool depends on the analysis target

The wavelet transform extends beyond signal processing into image compression (JPEG 2000), financial data analysis, biomedical signal processing, seismic analysis, and many other fields.

References