Time-Frequency Analysis Guide: Choosing FFT, STFT, Wavelet, and Hilbert Transform with Python

Compare FFT, STFT, Wavelet, and Hilbert transforms by time-frequency resolution, follow a use-case decision flow, and implement them with Python (scipy.signal.spectrogram / pywt.cwt / scipy.signal.hilbert).

Introduction

Time-frequency analysis is the umbrella name for techniques that estimate which frequencies are present in a signal, when they occur, and how strong they are at each instant. The flagship methods are the FFT, the STFT, the continuous and discrete wavelet transforms (CWT/DWT), the wavelet packet transform, the Hilbert transform, and the Hilbert-Huang transform (EMD followed by Hilbert spectral analysis). Each method has different assumptions, resolution trade-offs, computational cost, and signal classes for which it shines.

This article is a hub designed to answer the question: given my signal and goal, which method should I reach for, and with what parameters? We organize the landscape along three selection axes, summarize the methods in a feature matrix, walk through nine common application scenarios, and provide a unified Python evaluation framework that runs five methods side-by-side on the same chirp. Detailed derivations live in the per-method articles linked from this hub; this article keeps the map. Read it next to the Bode plot hub—its twin from the LTI-system design side—and you have a complete view of the frequency domain.

Three Selection Axes

Axis 1: Time resolution vs. frequency resolution (Heisenberg trade-off)

Any analysis window obeys the uncertainty principle

\[\Delta t \cdot \Delta f \ge \frac{1}{4\pi} \tag{1}\]

so a short time window blurs frequency content, and a fine frequency grid blurs time. Each method sits at a different point on this curve.

  • FFT sacrifices all time information for the sharpest possible frequency resolution. Intended for stationary signals.
  • STFT uses a fixed window, giving uniform time-frequency resolution across the entire band.
  • CWT scales the window with frequency: short at high frequencies, long at low frequencies. This is multi-resolution analysis.
  • Hilbert transform estimates the instantaneous amplitude and frequency at every sample. The time resolution is effectively one sample, but the signal must be narrowband.

Axis 2: Linear (FFT / STFT / Wavelet) vs. nonlinear (Hilbert-Huang)

A linear transform \(\mathcal{T}\) satisfies the superposition principle, \(\mathcal{T}(ax + by) = a\mathcal{T}(x) + b\mathcal{T}(y)\) . The FFT, STFT, CWT, DWT, and the wavelet packet transform are all linear—they project the signal onto fixed basis functions chosen a priori.

The Hilbert-Huang transform (HHT) is fundamentally different. Its first stage, EMD (Empirical Mode Decomposition), extracts a small set of Intrinsic Mode Functions (IMFs) directly from the data, in a way that depends on the signal itself. This makes HHT applicable to nonlinear and non-stationary signals where no fixed basis works well, at the cost of rigorous mathematical foundations.

Axis 3: Stationary vs. non-stationary signals

  • Stationary: statistics (mean, variance, spectrum) do not change over time → FFT with windowing and PSD is enough.
  • Quasi-stationary: stationary on short segments (speech is stationary on 20–30 ms windows) → STFT.
  • Non-stationary: frequency content changes over time (chirps, transients) → CWT, wavelet packet, or HHT.

Smoothing and detrending tools such as the moving average and the exponential moving average (EMA) act as low-pass filters along the time axis and serve as common pre-processing before any of the above transforms.

Feature Comparison Matrix

MethodTime localizationFrequency localizationComplexitySignal classTypical uses
FFTnonemaximal (\(1/N\) )\(O(N \log N)\)stationaryspectrum, PSD, harmonic analysis
STFTmedium (window)medium (\(1/L\) )\(O(N \log L)\)quasi-stationaryspeech, spectrograms
CWThigh at high freq.high at low freq.\(O(N \log N)\)non-stationarytransients, singularities, biosignals
DWToctave-gradedoctave-graded\(O(N)\)non-stationarycompression, denoising
Wavelet packetmedium–high (adaptive)medium–high (adaptive)\(O(N \log N)\)non-stationarybest-basis selection, feature extraction
Hilbert transformmaximal (per sample)single (narrowband)\(O(N \log N)\)narrowband AM/FMinstantaneous amplitude/frequency, demod
Hilbert-Huang (EMD + HSA)maximaladaptive (per IMF)\(O(N \cdot M)\)nonlinear / non-stationarybiosignals, seismology, oceanography

Here \(L\) is the STFT window length and \(M\) is the number of EMD sifting iterations.

Decision Flow: Nine Scenarios

  1. Pure spectrum estimation (e.g., identifying harmonics of motor excitation) → FFT with a Hann/Blackman window and Welch’s method.
  2. Transient analysis (impact response, defect detection) → CWT with a Morlet mother wavelet, or wavelet packet.
  3. Narrowband tone extraction and demodulation (AM/FM, PLL front-ends) → Hilbert transform after a tight bandpass filter.
  4. Mode decomposition (separating coexisting vibration modes) → EMD followed by Hilbert spectral analysis (HHT).
  5. Speech and speaker analysis (formants, F0 tracking) → STFT with the canonical 25 ms window and 10 ms hop.
  6. Mechanical vibration diagnostics (bearing faults, gear mesh) → envelope analysis with the Hilbert transform followed by FFT of the envelope.
  7. Biosignals (ECG, EEG, EMG) → CWT or HHT, both of which handle non-stationarity and nonlinearity well.
  8. Image and other 2D signals (feature extraction, compression) → 2D DWT.
  9. Detrending and pre-processing (high-frequency noise suppression) → moving average or EMA before any of the above.

A practical heuristic: try FFT first; if you need time localization, switch to STFT; if STFT is too rigid, switch to wavelets; if you need instantaneous quantities, switch to Hilbert. This order matches the cost of complexity and rarely overshoots.

Unified Evaluation Framework: Five Methods on One Chirp

A linear chirp—where the instantaneous frequency rises linearly with time—plus a stationary tone is a standard stress test for time-frequency methods. The script below runs FFT, STFT, CWT, wavelet packet, and Hilbert analysis on the same signal so you can read the strengths and weaknesses off one figure.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import chirp, spectrogram, hilbert
import pywt

fs = 1000  # sampling rate [Hz]
T = 2.0    # duration [s]
t = np.linspace(0, T, int(fs * T), endpoint=False)

# 10 Hz -> 200 Hz linear chirp plus a steady 100 Hz tone, with noise
x = chirp(t, f0=10, f1=200, t1=T, method="linear") + 0.5 * np.sin(2 * np.pi * 100 * t)
x += 0.1 * np.random.randn(len(t))

fig, axes = plt.subplots(5, 1, figsize=(10, 14))

# (1) FFT - no time information, best frequency resolution
X = np.fft.rfft(x)
f = np.fft.rfftfreq(len(x), 1 / fs)
axes[0].plot(f, 20 * np.log10(np.abs(X) + 1e-12))
axes[0].set(title="FFT (no time)", xlabel="Hz", ylabel="dB")

# (2) STFT - uniform time-frequency grid
f_s, t_s, Sxx = spectrogram(x, fs, nperseg=128, noverlap=96)
axes[1].pcolormesh(t_s, f_s, 10 * np.log10(Sxx + 1e-12), shading="gouraud")
axes[1].set(title="STFT (fixed window)", xlabel="s", ylabel="Hz")

# (3) CWT - multi-resolution (sharp time at high freq., sharp freq. at low freq.)
scales = np.arange(1, 128)
coef, freqs = pywt.cwt(x, scales, "morl", sampling_period=1 / fs)
axes[2].pcolormesh(t, freqs, np.abs(coef), shading="gouraud")
axes[2].set(title="CWT Morlet", xlabel="s", ylabel="Hz", ylim=(0, 250))

# (4) Wavelet packet - adaptive frequency bands
wp = pywt.WaveletPacket(data=x, wavelet="db4", maxlevel=5)
nodes = [n.path for n in wp.get_level(5, "natural")]
wp_mat = np.array([wp[n].data for n in nodes])
axes[3].imshow(np.abs(wp_mat), aspect="auto", origin="lower",
               extent=[0, T, 0, fs / 2])
axes[3].set(title="Wavelet Packet (level 5)", xlabel="s", ylabel="Hz")

# (5) Hilbert - instantaneous envelope and frequency (narrowband assumption)
analytic = hilbert(x)
envelope = np.abs(analytic)
inst_freq = np.diff(np.unwrap(np.angle(analytic))) * fs / (2 * np.pi)
axes[4].plot(t, envelope, label="Envelope")
axes[4].plot(t[1:], inst_freq / 20, label="Inst. freq. /20", alpha=0.7)
axes[4].set(title="Hilbert", xlabel="s")
axes[4].legend()

plt.tight_layout()
plt.show()

The FFT panel shows two blurred lobes—a single peak at 100 Hz and an “average” of the chirp spread across 10–200 Hz, with no hint that the chirp swept across time. The STFT panel reveals a diagonal ridge (the chirp) plus a horizontal line (the tone), but the diagonal is jagged at low frequencies where the fixed 128-sample window is too short. The CWT panel renders the same diagonal smoothly because its window grows toward low frequencies. The wavelet packet panel exposes adaptive frequency bands and is the basis for best-basis classification. The Hilbert panel cannot represent two simultaneous components honestly (its instantaneous frequency curve is meaningful only inside a narrow band), but its envelope tracks the total signal energy with one-sample resolution.

This is the unified evaluation pattern: keep the signal fixed, switch the method, read the figure. Apply the same five-up layout to your own data and the right tool usually becomes obvious within minutes.

Design Parameters

MethodKey parametersGuidance
FFTwindow / length \(N\)Hann by default; Blackman when leakage matters; \(N = 2^k\) so \(f_s/N\) is the resolution
STFTwindow length / hopSpeech: 25 ms window, 10 ms hop. Choose so that target \(\Delta t \le L \le 1/\Delta f\)
CWTmother waveletVibration / transients → Morlet; edges → Mexican Hat; biosignals → db4. Log-spaced scales
Wavelet packetdepth / waveletlevel \(\approx \lfloor \log_2 N \rfloor - 3\) . Best basis via Shannon entropy
Hilbertupstream bandpassBandpass to fc ± Δf so the input is narrowband; otherwise instantaneous frequency rings

Window-length intuition for STFT. With a window of \(L\) samples and sampling rate \(f_s\) , the frequency resolution is \(\Delta f \approx f_s / L\) and the time resolution is \(\Delta t = L / f_s\) . Example: at \(f_s = 1000\) Hz with \(L = 128\) , you get \(\Delta f \approx 7.8\) Hz and \(\Delta t = 128\) ms—any phenomenon faster than 128 ms or narrower than 8 Hz cannot be resolved by that configuration. If you need both, you need wavelets.

Mother wavelet intuition for CWT. Morlet is a Gaussian-modulated complex sinusoid, giving smooth time-frequency atoms—ideal for oscillatory transients. Mexican Hat (Ricker) is the second derivative of a Gaussian, sharper in time, and better at locating singularities and edges. Daubechies db4 is a compactly supported orthogonal wavelet often used for DWT-based denoising of biosignals.

Hilbert pre-conditioning. The Hilbert transform’s instantaneous frequency is only physically meaningful for narrowband signals. If your signal contains two coexisting components, the unwrapped phase oscillates wildly; pre-filter to isolate one component or switch to EMD/HHT.

When Each Method Fails (and the Workarounds)

Every transform has a domain where it breaks, and recognizing the failure mode is as important as choosing the method.

FFT on a non-stationary signal flattens the time axis into a single average. A chirp that sweeps from 10 Hz to 200 Hz looks like a wide, weak plateau—nothing tells you the energy was actually concentrated at one frequency at any given time. The standard fix is to segment the signal into short stationary chunks and apply the FFT to each chunk independently, which is exactly what the STFT does. If the segmentation is awkward (the signal is non-stationary on every time scale), reach for wavelets instead.

STFT on a signal with a wide dynamic range across frequencies. A fixed window that resolves 100 Hz well will smear the time location of a 1 Hz feature; the same window that captures 1 Hz cleanly will average away every fast transient at 100 Hz. This is the canonical “STFT is rigid” complaint, and it is the historical motivation for wavelets. If your signal has features at very different time scales, switch to the CWT (or a multi-resolution wavelet packet).

CWT and visual interpretation. The CWT scalogram is dense, beautiful, and easy to over-interpret. Ridges that look like physical components can be artifacts of the mother wavelet’s time-frequency footprint, especially near signal boundaries (cone of influence). Always plot the cone of influence and verify ridges by reconstructing the signal from a single scale.

Wavelet packet and basis explosion. A level-\(L\) wavelet packet has \(2^L\) leaves and a combinatorial number of admissible bases. Without a principled best-basis criterion (Coifman-Wickerhauser entropy is the standard), you risk over-fitting noise. Cross-validate the chosen basis on held-out data when using wavelet packets for classification.

Hilbert and broadband signals. Apply the Hilbert transform to a broadband signal and the instantaneous frequency curve will oscillate violently and even go negative. This is a hallmark of misuse; the transform is mathematically defined for any signal but is physically meaningful only when the signal has a single dominant frequency at each instant. The fix is to bandpass-filter into narrow sub-bands and Hilbert-analyze each one—which, with data-driven sub-bands, is exactly HHT.

HHT and reproducibility. EMD’s sifting process depends on cubic spline interpolation of envelopes and is sensitive to noise and endpoint handling. Different implementations can produce different IMFs from the same signal. For production systems, fix the random seed, the interpolation scheme, and the stopping criterion, and document them as part of your pipeline.

Reading the Trade-offs Through a Single Number

A useful summary statistic is the time-bandwidth product of the analyzing window or wavelet, often denoted \(\sigma_t \sigma_f\) . The smaller it is, the closer the method approaches the Heisenberg lower bound of \(1/(4\pi)\) . Hann windows have \(\sigma_t \sigma_f \approx 0.45\) , Blackman about \(0.50\) , and Gaussian-Morlet wavelets sit near the optimum at \(\approx 1/(4\pi)\) . When you see a paper quote “near-optimal joint resolution,” they mean the analyzing function is close to this bound.

The practical implication: if two methods agree on the time-bandwidth product, the difference between them is not resolution but what they assume about the signal (stationarity, linearity, narrowbandness). That is why the decision flow above keys on signal class rather than on resolution numbers.

Use this hub as the front door; jump to detailed articles via the links below.

Core methods

Frequency-domain hub (twin article)

Summary

Choosing the right time-frequency method comes down to three mechanical questions:

  1. Do I need only the spectrum, with no time information? → FFT.
  2. Do I need time and frequency, with a uniform grid? → STFT.
  3. Do I need adaptive resolution—sharp time at high frequency and sharp frequency at low frequency? → CWT or wavelet packet.
  4. Do I need instantaneous amplitude or frequency, sample by sample? → Hilbert transform (with a narrowband front-end).
  5. Is the signal nonlinear and non-stationary, with physically meaningful modes? → Hilbert-Huang transform.

When in doubt, take the chirp script above, swap in your signal, and let the five panels make the choice for you. The detailed articles linked from this hub then carry you from “I picked a method” to “I tuned its parameters and shipped.” Bouncing between this hub and the per-method posts is the fastest path from a raw waveform to a publishable plot.