Window Functions and Power Spectral Density (PSD): Theory and Python Implementation

A comprehensive guide to window functions for FFT analysis and Power Spectral Density estimation using Welch's method, with full Python implementations.

Introduction

In the FFT article, we covered the DFT algorithm and basic frequency analysis. We briefly touched on using the Hann window to mitigate spectral leakage, but there are many types of window functions, and choosing the right one for the task at hand is essential.

Moreover, the amplitude spectrum from FFT shows “how much amplitude exists at each frequency,” but in practice we often need to know the signal power per unit frequency. This quantity is the Power Spectral Density (PSD).

This article dives deeper into the mathematics behind spectral leakage, compares the characteristics of major window functions, and implements PSD estimation via Welch’s method in Python.

Mathematical Understanding of Spectral Leakage

Finite-Length Signals and the Rectangular Window

In practice, we can never record a signal for infinite duration. We acquire a finite-length signal \(x[n]\) (\(n = 0, 1, \ldots, N-1\)). Mathematically, this is equivalent to multiplying an infinite-length signal \(x_\infty[n]\) by a rectangular window \(w_R[n]\).

\[x[n] = x_\infty[n] \cdot w_R[n] \tag{1}\]

The rectangular window is defined as:

\[w_R[n] = \begin{cases} 1, & 0 \leq n \leq N-1 \\ 0, & \text{otherwise} \end{cases} \tag{2}\]

Effect in the Frequency Domain

Multiplication in the time domain corresponds to convolution in the frequency domain.

\[X(f) = X_\infty(f) * W_R(f) \tag{3}\]

The Fourier transform of the rectangular window is the Dirichlet kernel, which in continuous approximation is proportional to the sinc function.

\[W_R(f) \approx N \cdot \text{sinc}(Nf) \cdot e^{-j\pi f(N-1)} \tag{4}\]

This sinc function has a broad mainlobe and slowly decaying sidelobes. Through the convolution in Eq. \((3)\), a spectrum that should be concentrated at a single frequency spreads to adjacent frequencies via these sidelobes. This is the mathematical origin of spectral leakage.

Mainlobe and Sidelobes

The frequency characteristics of a window function are described by two key elements:

  • Mainlobe: The primary peak around the center frequency. A narrower mainlobe yields higher frequency resolution.
  • Sidelobes: Secondary peaks on either side of the mainlobe. Lower sidelobe levels mean less spectral leakage.

The rectangular window has a peak sidelobe level of only \(-13\) dB, which means weak signal components near a strong tone can be buried under the sidelobes. By choosing different window functions, we can control this tradeoff.

Major Window Functions

Below are the mathematical definitions of commonly used window functions, all of length \(N\).

Rectangular Window

\[w[n] = 1, \quad 0 \leq n \leq N-1 \tag{5}\]

Equivalent to applying no window. It has the narrowest mainlobe and therefore the best frequency resolution, but the sidelobes are large and spectral leakage is severe.

Hann Window

\[w[n] = 0.5\left(1 - \cos\left(\frac{2\pi n}{N-1}\right)\right) \tag{6}\]

A cosine-based window that tapers to zero at both ends. The sidelobe decay rate of \(-18\) dB/oct is fast, making it the most widely used general-purpose window.

Hamming Window

\[w[n] = 0.54 - 0.46\cos\left(\frac{2\pi n}{N-1}\right) \tag{7}\]

Similar to the Hann window, but the endpoints do not reach zero (they have a value of approximately 0.08). The first sidelobe is suppressed to \(-43\) dB, but the far sidelobe decay is only \(-6\) dB/oct, slower than the Hann window.

Blackman Window

\[w[n] = 0.42 - 0.5\cos\left(\frac{2\pi n}{N-1}\right) + 0.08\cos\left(\frac{4\pi n}{N-1}\right) \tag{8}\]

Composed of three cosine terms. The first sidelobe is extremely low at \(-58\) dB, making it suitable for high dynamic range analysis, but the wider mainlobe reduces frequency resolution.

Kaiser Window

\[w[n] = \frac{I_0\left(\beta\sqrt{1 - \left(\frac{2n}{N-1} - 1\right)^2}\right)}{I_0(\beta)} \tag{9}\]

Here \(I_0\) is the zeroth-order modified Bessel function of the first kind, and \(\beta\) is the shape parameter. By adjusting \(\beta\), you can continuously control the tradeoff between mainlobe width and sidelobe suppression. \(\beta = 0\) gives the rectangular window, and \(\beta \approx 5.4\) approximates the Hamming window.

Window Function Characteristics Comparison

WindowMainlobe WidthPeak Sidelobe [dB]Sidelobe DecayENBW (bins)
Rectangular2 bins-13-6 dB/oct1.00
Hann4 bins-32-18 dB/oct1.50
Hamming4 bins-43-6 dB/oct1.36
Blackman6 bins-58-18 dB/oct1.73
Kaiser (β=6)VariableVariableVariableVariable

ENBW (Equivalent Noise Bandwidth) indicates how many frequency bins of bandwidth a window effectively has for white noise. A larger value means more noise influence on the estimate.

Window Functions: Frequency Response Comparison (Python)

Let us generate all window functions and compare their frequency responses on a dB scale.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal.windows import hann, hamming, blackman, kaiser

# Window length
N = 64
# FFT size (zero-padded for interpolation)
N_fft = 4096

# Generate each window function
windows = {
    'Rectangular': np.ones(N),
    'Hann': hann(N),
    'Hamming': hamming(N),
    'Blackman': blackman(N),
    'Kaiser (β=6)': kaiser(N, beta=6),
}

# Compute and plot frequency responses
fig, ax = plt.subplots(figsize=(10, 6))

for name, w in windows.items():
    # Compute frequency response via FFT
    W = np.fft.fft(w, n=N_fft)
    W_shift = np.fft.fftshift(W)
    # Normalize (mainlobe peak at 0 dB)
    W_dB = 20 * np.log10(np.maximum(np.abs(W_shift) / np.abs(W_shift).max(), 1e-12))
    # Frequency axis (in bins)
    freq_bins = np.linspace(-N / 2, N / 2, N_fft)
    ax.plot(freq_bins, W_dB, label=name)

ax.set_xlim(-15, 15)
ax.set_ylim(-120, 5)
ax.set_xlabel('Frequency [bins]')
ax.set_ylabel('Magnitude [dB]')
ax.set_title('Window Functions: Frequency Response Comparison')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

This plot visually confirms that the rectangular window has the narrowest mainlobe but the largest sidelobes, while the Blackman window has very low sidelobes at the cost of a wider mainlobe.

Power Spectral Density (PSD)

Definition and Physical Meaning

The amplitude spectrum \(|X[k]|\) shows the signal amplitude at each frequency bin, but it depends on the FFT length \(N\) and sampling rate \(f_s\), making direct comparison between spectra obtained under different conditions difficult.

Power Spectral Density (PSD) is defined as signal power per unit frequency and shows how a signal’s power is distributed across the frequency axis. Its unit is \(\text{V}^2/\text{Hz}\) (or the appropriate physical unit squared per Hz).

An important property of PSD is that integrating it over all frequencies yields the total signal power (variance).

\[\int_0^{f_s/2} S_{xx}(f) \, df = \sigma_x^2 \tag{10}\]

Difference from Amplitude Spectrum

PropertyAmplitude SpectrumPower Spectral Density
Value\(\|X[k]\|\)\(S_{xx}(f)\)
UnitV (signal unit)\(\text{V}^2/\text{Hz}\)
N dependencyProportional to NIndependent of N
Use caseCheck amplitude at a frequencyQuantitative power distribution

PSD Estimation Methods

Periodogram

The simplest PSD estimator is the periodogram, computed directly from the FFT result.

\[\hat{S}_{xx}[k] = \frac{|X[k]|^2}{N \cdot f_s \cdot U} \tag{11}\]

Here \(X[k]\) is the DFT of the windowed signal and \(U = \frac{1}{N}\sum_{n=0}^{N-1} w[n]^2\) is the window power correction factor (\(U=1\) for the rectangular window). The periodogram is asymptotically unbiased (unbiased as \(N \to \infty\)), but suffers from high variance. Even increasing the data length \(N\) does not reduce the variance, so the estimate fluctuates significantly due to noise.

Welch’s Method

Welch’s method (1967) is a practical technique for reducing periodogram variance. The procedure is:

  1. Divide the signal into segments of length \(L\) with overlap
  2. Apply a window function \(w[n]\) to each segment
  3. Compute the modified periodogram for each segment
  4. Average all segment results

The modified periodogram of the \(i\)-th segment is defined as:

\[\hat{S}_{xx}^{(i)}[k] = \frac{1}{L \cdot f_s \cdot U} \left| \sum_{n=0}^{L-1} x_i[n] \cdot w[n] \cdot e^{-j2\pi kn/L} \right|^2 \tag{12}\]

where \(U\) is the window power normalization factor:

\[U = \frac{1}{L}\sum_{n=0}^{L-1} w[n]^2 \tag{13}\]

Averaging over \(K\) segments gives the Welch PSD estimate:

\[\hat{S}_{xx}^{\text{Welch}}[k] = \frac{1}{K}\sum_{i=1}^{K} \hat{S}_{xx}^{(i)}[k] \tag{14}\]

Overlap is used to make effective use of the data and increase the number of segments available for averaging. A 50% overlap is typical. Excessive overlap increases inter-segment correlation and diminishes the averaging benefit.

Segment length tradeoff: longer segments improve frequency resolution but yield fewer segments for averaging (higher variance). Shorter segments reduce variance but degrade frequency resolution.

Practical PSD Estimation in Python

We generate a test signal with two sinusoids (50 Hz and 120 Hz) plus white noise, then compare PSD estimates from the periodogram and Welch’s method.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import welch
from scipy.signal.windows import hann

# --- Generate test signal ---
np.random.seed(0)
fs = 1000            # Sampling frequency [Hz]
T = 4.0              # Signal duration [s]
t = np.arange(0, T, 1/fs)
N_total = len(t)

# 50 Hz (amplitude 1.0) and 120 Hz (amplitude 0.5) + white noise
signal = (np.sin(2 * np.pi * 50 * t)
          + 0.5 * np.sin(2 * np.pi * 120 * t)
          + 0.8 * np.random.randn(N_total))

# --- 1. Periodogram (manual implementation) ---
window = hann(N_total)
X = np.fft.fft(signal * window)
freqs_periodo = np.fft.fftfreq(N_total, 1/fs)
# Window power correction
U = np.mean(window**2)
# One-sided periodogram
psd_periodo = (np.abs(X[:N_total//2])**2) / (N_total * fs * U)
psd_periodo[1:-1] *= 2  # Double all bins except DC and Nyquist
freqs_periodo = freqs_periodo[:N_total//2]

# --- 2. Welch's method (scipy.signal.welch) ---
freqs_welch, psd_welch = welch(signal, fs=fs, window='hann',
                                nperseg=512, noverlap=256)

# --- 3. Window function comparison ---
windows_list = ['hann', 'hamming', 'blackman']
psd_results = {}
for win_name in windows_list:
    f, p = welch(signal, fs=fs, window=win_name,
                 nperseg=512, noverlap=256)
    psd_results[win_name] = (f, p)

# --- Plot ---
fig, axes = plt.subplots(3, 1, figsize=(10, 12))

# (a) Periodogram vs Welch's method
axes[0].semilogy(freqs_periodo, psd_periodo, alpha=0.5, label='Periodogram')
axes[0].semilogy(freqs_welch, psd_welch, label='Welch (nperseg=512)')
axes[0].set_xlabel('Frequency [Hz]')
axes[0].set_ylabel('PSD [V²/Hz]')
axes[0].set_title('Periodogram vs Welch Method')
axes[0].legend()
axes[0].set_xlim(0, 200)
axes[0].grid(True, alpha=0.3)

# (b) Welch's method with different windows
for win_name, (f, p) in psd_results.items():
    axes[1].semilogy(f, p, label=f'Welch ({win_name})')
axes[1].set_xlabel('Frequency [Hz]')
axes[1].set_ylabel('PSD [V²/Hz]')
axes[1].set_title('Welch Method: Window Function Comparison')
axes[1].legend()
axes[1].set_xlim(0, 200)
axes[1].grid(True, alpha=0.3)

# (c) Input signal waveform
axes[2].plot(t[:500], signal[:500])
axes[2].set_xlabel('Time [s]')
axes[2].set_ylabel('Amplitude')
axes[2].set_title('Input Signal (first 0.5 s)')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

The periodogram shows large fluctuations in the estimate, whereas Welch’s method produces a much smoother result through averaging. The peaks at 50 Hz and 120 Hz are clearly visible, with the 120 Hz peak approximately 1/4 the power of the 50 Hz peak (since the amplitude ratio is 0.5, and power scales as amplitude squared).

Window Function Selection Guidelines

Use CaseRecommended WindowReason
General-purpose analysisHannGood balance of sidelobe suppression and resolution
Resolving close frequenciesRectangular / Kaiser (low β)Narrowest mainlobe
High dynamic range analysisBlackman / Kaiser (high β)Lowest sidelobes for detecting weak signals
Hardware / real-timeHammingSimple computation, non-zero at endpoints

In practice, it is recommended to start with the Hann window and switch to other windows as needed.

Summary

  • Spectral leakage occurs because finite-length signal truncation (rectangular windowing) causes convolution in the frequency domain
  • Window functions control the tradeoff between mainlobe width and sidelobe suppression; options include Hann, Hamming, Blackman, and Kaiser for different purposes
  • Power Spectral Density (PSD) expresses power per unit frequency, enabling comparison across different measurement conditions
  • The periodogram has high variance, so Welch’s method with segment averaging is the standard practical estimator
  • Window function selection also affects PSD estimation quality, so choosing appropriately for the analysis objective is important

References

  • Harris, F. J. (1978). “On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform.” Proceedings of the IEEE, 66(1), 51-83.
  • Welch, P. D. (1967). “The Use of Fast Fourier Transform for the Estimation of Power Spectra.” IEEE Transactions on Audio and Electroacoustics, 15(2), 70-73.
  • Oppenheim, A. V., & Schafer, R. W. (2009). Discrete-Time Signal Processing (3rd ed.). Prentice Hall.
  • NumPy FFT documentation
  • SciPy Signal Processing documentation