Introduction
As covered in Introduction to Wavelet Transform, the Discrete Wavelet Transform (DWT) recursively splits a signal into “approximation (low-frequency)” and “detail (high-frequency)” components. However, the standard DWT algorithm (Mallat decomposition) has a fundamental limitation:
DWT recursively decomposes only the approximation (low-frequency) branch, leaving the detail (high-frequency) branch undivided beyond a single split. As a result, the frequency resolution in high-frequency bands is coarse. For example, with a 4-level decomposition of a signal sampled at 8 kHz, the finest-detail coefficients (cD1) cover the entire 2–4 kHz band, and frequency structure within that wide band cannot be captured.
The Wavelet Packet Transform (WPT) lifts this restriction and recursively splits the entire band, including the detail side, yielding a flexible high-resolution decomposition of the time-frequency plane. Combined with a cost function (typically an entropy), the Best Basis algorithm of Coifman and Wickerhauser automatically selects the optimal subtree—a signal-adaptive time-frequency tiling.
This article covers the mathematical structure of wavelet packets, the Coifman–Wickerhauser Best Basis algorithm, and a Python implementation using PyWavelets.
Wavelet Packet Basics
Full Binary Tree Decomposition
DWT is a “left-skewed tree” (only the low-pass branch is recursively decomposed), whereas the wavelet packet transform forms a complete binary tree. Every node is split into two children by the low-pass filter \(h\) and the high-pass filter \(g\) .
Let \(d_{j,n}[k]\) denote the packet coefficients at node \((j, n)\) on level \(j\) . Then:
\[d_{j+1, 2n}[k] = \sum_{l} h[l - 2k] \, d_{j, n}[l] \tag{1}\] \[d_{j+1, 2n+1}[k] = \sum_{l} g[l - 2k] \, d_{j, n}[l] \tag{2}\]Here \(n\) is the subband index within the level (\(0 \le n < 2^j\) ), and \(h\) , \(g\) are the scaling and wavelet filters. The root \((0, 0)\) corresponds to the input signal \(x[k]\) .
Wavelet Packet Basis Functions
In continuous time, the wavelet packet basis \(\psi_{n,j,k}(t)\) is generated by the two-scale equations:
\[\psi_{2n}(t) = \sqrt{2} \sum_k h[k] \, \psi_n(2t - k) \tag{3}\] \[\psi_{2n+1}(t) = \sqrt{2} \sum_k g[k] \, \psi_n(2t - k) \tag{4}\]with \(\psi_0 = \phi\) (scaling function) and \(\psi_1 = \psi\) (mother wavelet). Adding shift and dyadic scaling gives \(\psi_{n,j,k}(t) = 2^{-j/2}\psi_n(2^{-j} t - k)\) , the basis function at level \(j\) , subband \(n\) .
The relation to DWT is simple: DWT corresponds to the special subtree that decomposes only \(n = 0\) at every level. WPT removes this restriction and treats the full binary tree where every node is split.
Complete Decomposition and Redundancy
Hierarchy of Subspaces
When fully expanded to depth \(J\) , the space \(L^2(\mathbb{R})\) decomposes into \(2^J\) orthogonal subspaces \(W_{J,n}\) (\(0 \le n < 2^J\) ):
\[L^2(\mathbb{R}) = \bigoplus_{n=0}^{2^J - 1} W_{J,n} \tag{5}\]Each \(W_{J,n}\) corresponds to a uniform partition of the frequency axis, with coefficients \(d_{J,n}\) coming from the level-\(J\) filter bank output. At the same time, a nested orthogonal decomposition holds:
\[W_{j,n} = W_{j+1, 2n} \oplus W_{j+1, 2n+1} \tag{6}\]That is, replacing the basis of a parent node by the bases of its two children is a valid orthogonal transformation. This is exactly what justifies the Best Basis algorithm below.
Energy Conservation (Parseval)
Because the wavelet packet transform is orthogonal, as long as the leaf set forms a complete decomposition (covering \([0, N)\) ), Parseval’s identity guarantees energy preservation:
\[\sum_{k} |x[k]|^2 = \sum_{(j, n) \in \mathcal{T}} \sum_{k} |d_{j,n}[k]|^2 \tag{7}\]where \(\mathcal{T}\) is the set of leaf nodes that form a complete decomposition. This conservation law is what makes entropy-type cost comparisons meaningful.
Redundancy and “Basis” Selection
The full tree contains roughly \((2J + 1) \cdot N\) coefficients—redundant relative to the \(N\) -sample input. However, selecting one set of leaves \(\mathcal{T}\) that forms a complete decomposition gives an orthonormal basis sufficient to represent the original signal. This is the heart of WPT.
In other words, the WPT tree provides a “dictionary of countless orthonormal bases”, and the next section explains how to select the one best matched to the signal.
Best Basis Selection
Cost Functions and Entropy
The “complexity” of representing a signal \(x\) in a given basis (leaf set \(\mathcal{T}\) ) is measured by an additive cost function \(\mathcal{M}(\mathcal{T})\) . Additive means that for every coefficient \(d_{j,n}[k]\) in \(\mathcal{T}\) ,
\[\mathcal{M}(\mathcal{T}) = \sum_{(j, n) \in \mathcal{T}} \sum_{k} \mu(|d_{j,n}[k]|^2) \tag{8}\]A canonical choice is the Shannon entropy-type cost introduced by Coifman and Wickerhauser:
\[\mu(p) = -p \log p \quad (p > 0), \quad \mu(0) = 0 \tag{9}\]With normalization \(p_k = |d_{j,n}[k]|^2 / \|x\|^2\) , \(\mathcal{M}\) becomes the Shannon information entropy of the coefficient distribution. Smaller values mean “energy concentrated in a few large coefficients”, i.e. sparser.
Other choices include \(\ell^p\) norms (\(p < 2\) ), log-energy \(\sum \log |d|^2\) , and the count of coefficients above a threshold. The shared requirement is additivity, so that subtrees can be compared locally.
Best Basis Algorithm
The Coifman–Wickerhauser Best Basis algorithm (1992) finds the complete decomposition \(\mathcal{T}\) minimizing \(\mathcal{M}(\mathcal{T})\) in \(O(N \log N)\) . It is a bottom-up dynamic program that, at each node, applies the recurrence:
\[\mathcal{M}_{\text{best}}(j, n) = \min \big\{ \mathcal{M}(d_{j, n}), \, \mathcal{M}_{\text{best}}(j+1, 2n) + \mathcal{M}_{\text{best}}(j+1, 2n+1) \big\} \tag{10}\]- First term: cost of treating the node as a leaf.
- Second term: sum of optimal costs of the two children.
If the children’s total cost is smaller than the parent’s, the parent is split; otherwise it remains a leaf. Leaves return their own cost without comparison. A single bottom-up sweep determines the optimal basis, matching the \(O(N \log N)\) cost of computing the coefficients themselves.
Geometric Interpretation
Minimizing entropy is equivalent to selecting the decomposition in which the signal energy is concentrated in a few packet coefficients:
- A pure sinusoid → concentrates in a frequency-localized narrowband packet → deep leaves with fine frequency resolution are chosen.
- An impulse or transient → concentrates in time-localized shallow leaves.
- A chirp or frequency-varying signal → intermediate leaves balancing time and frequency.
Best Basis thus goes beyond the fixed tiling of STFT and the left-skewed tiling of DWT: it adaptively chooses the shape of each time-frequency tile based on the signal itself.
Python Implementation
Full Decomposition with PyWavelets
pywt.WaveletPacket makes full decomposition and node access straightforward.
import numpy as np
import matplotlib.pyplot as plt
import pywt
# --- Test signal: two sinusoids + transient pulse ---
np.random.seed(0)
fs = 1024
N = 1024
t = np.arange(N) / fs
low = np.sin(2 * np.pi * 20 * t)
high = 0.6 * np.sin(2 * np.pi * 200 * t) * (t > 0.5)
pulse = np.zeros(N)
pulse[300:305] = 2.0
signal = low + high + pulse + 0.05 * np.random.randn(N)
# --- Full wavelet packet decomposition ---
wp = pywt.WaveletPacket(data=signal, wavelet="db4", mode="symmetric", maxlevel=5)
# Get all nodes at level 5 (natural order)
nodes_level5 = wp.get_level(5, order="natural")
print(f"Number of nodes at level 5: {len(nodes_level5)}") # 2^5 = 32
print(f"Coefficient length per node: {len(nodes_level5[0].data)}")
WaveletPacket is lazily evaluated; you can access any node directly with a binary path like wp['aad'] (a = approximation, d = detail). get_level(j, order="natural") returns the \(2^j\)
nodes in natural (filtering) order.
Entropy Cost Implementation
Shannon entropy as an additive cost function:
def shannon_entropy_cost(coeffs, eps=1e-12):
"""Shannon-entropy-type cost of a coefficient vector.
p_k = |c_k|^2 / sum(|c|^2), H = -sum(p_k * log(p_k)).
Additive under energy conservation.
"""
energy = np.sum(np.abs(coeffs) ** 2)
if energy < eps:
return 0.0
p = (np.abs(coeffs) ** 2) / energy
p = p[p > eps]
return float(-np.sum(p * np.log(p)))
Best Basis Algorithm
PyWavelets does not expose Best Basis directly, so we implement the recurrence in (10):
def best_basis(wp, cost_fn, level, current_path=""):
"""Coifman-Wickerhauser Best Basis search.
Returns
-------
selected_paths : list[str] selected leaf node paths
total_cost : float optimal cost of the subtree
"""
node = wp[current_path] if current_path else wp
own_cost = cost_fn(node.data)
# Leaf at max decomposition depth: return self
if len(current_path) == level:
return [current_path], own_cost
# Recurse into children
left_paths, left_cost = best_basis(wp, cost_fn, level, current_path + "a")
right_paths, right_cost = best_basis(wp, cost_fn, level, current_path + "d")
child_cost = left_cost + right_cost
if child_cost < own_cost:
return left_paths + right_paths, child_cost
else:
return [current_path], own_cost
selected, total_cost = best_basis(wp, shannon_entropy_cost, level=5)
print(f"Number of selected leaves: {len(selected)}")
print(f"Optimal cost (entropy): {total_cost:.4f}")
print("Sample selected paths:", selected[:8])
Compared with the full decomposition (\(2^5 = 32\) leaves), the selected set is typically smaller. Depending on the signal, the algorithm picks shallow nodes in time intervals with transient pulses and deep nodes in bands dominated by stationary sinusoids—a hybrid decomposition no fixed tiling can match.
Visualizing the Best Basis Tiling
Plot the selected leaves as rectangles on the time-frequency plane:
def plot_best_basis_tiling(selected_paths, max_level, fs, ax=None):
"""Visualize Best Basis leaves as time-frequency tiles."""
if ax is None:
_, ax = plt.subplots(figsize=(10, 5))
for path in selected_paths:
j = len(path) # decomposition level = depth
# Path -> subband index (natural order)
n_natural = int(path.replace("a", "0").replace("d", "1"), 2) if path else 0
# Natural -> frequency order (Gray-code-like reorder)
n_freq = n_natural
for shift in (1,):
n_freq ^= n_natural >> shift
f_low = n_freq * (fs / 2) / (2 ** j)
f_high = (n_freq + 1) * (fs / 2) / (2 ** j)
ax.add_patch(
plt.Rectangle(
(0, f_low),
1.0, # leaves cover the full time extent here
f_high - f_low,
fill=False,
edgecolor="C0",
linewidth=1.0,
)
)
ax.set_xlim(0, 1)
ax.set_ylim(0, fs / 2)
ax.set_xlabel("Time (normalized)")
ax.set_ylabel("Frequency [Hz]")
ax.set_title("Best Basis Tiling (Wavelet Packet)")
return ax
fig, ax = plt.subplots(figsize=(10, 5))
plot_best_basis_tiling(selected, max_level=5, fs=fs, ax=ax)
plt.tight_layout()
plt.show()
Tall tiles mean “sacrifice frequency resolution for time resolution”; wide tiles mean the opposite. Their shapes adapt to the signal, in contrast to the uniform tiles of STFT and the left-skewed tiles of DWT.
Full Decomposition vs Best Basis: Sparsity
Quantify how Best Basis improves sparsity:
def coefficient_magnitudes(wp, paths):
"""Concatenate magnitudes of coefficients from the given paths."""
coeffs = []
for p in paths:
coeffs.append(np.abs(wp[p].data))
return np.concatenate(coeffs)
full_paths = [n.path for n in wp.get_level(5, order="natural")]
mags_full = np.sort(coefficient_magnitudes(wp, full_paths))[::-1]
mags_best = np.sort(coefficient_magnitudes(wp, selected))[::-1]
def energy_concentration(mags, k_ratio=0.05):
"""Energy captured by the top k_ratio of coefficients."""
k = max(1, int(len(mags) * k_ratio))
total = np.sum(mags ** 2)
top = np.sum(mags[:k] ** 2)
return top / total
print(f"Full decomposition (level 5): top-5% energy = {energy_concentration(mags_full):.3f}")
print(f"Best Basis : top-5% energy = {energy_concentration(mags_best):.3f}")
Best Basis concentrates energy in a few coefficients, improving the sparsity metric used for compression and thresholding.
Applications
Wavelet packet transform with Best Basis selection is especially powerful in:
Signal Compression
Generalizes DWT-based compression (as in JPEG 2000) by adapting the basis to each signal. For acoustic signals with dominant narrowband structure, it can outperform DWT.
Non-stationary Signal Classification
EEG, EMG, and machine vibration signals exhibit class-specific time-frequency patterns. Selecting representative packets per class via Best Basis and using their coefficient statistics as features captures discriminative structure that fixed tilings miss. This is also applicable as feature extraction for Time-Series Anomaly Detection.
Speech and Audio Analysis
Speech contains narrowband formant resonances and transient plosives, which fixed-window STFT cannot resolve simultaneously. Best Basis captures both at high resolution, and is also used in music chord analysis and bioacoustic studies of dolphin and bat calls.
Communications and Radar Signal Detection
For unknown modulations or chirps, energy-concentrated leaves serve as features that reveal structure invisible to FFT or Window Functions and PSD.
Comparison: DWT vs WPT + Best Basis
| Feature | DWT | WPT + Best Basis |
|---|---|---|
| Decomposition tree | Left-skewed (only approx branch) | Full binary tree (recurses everywhere) |
| High-freq resolution | Coarse (finest cD is wideband) | Fine (split to any depth) |
| Basis | Fixed | Signal-adaptive (cost minimization) |
| Number of bases | 1 | ~\(2^{2^{J-1}}\) complete decompositions |
| Decomposition cost | \(O(N)\) | \(O(N \log N)\) |
| Basis-search cost | N/A | \(O(N \log N)\) (Coifman–Wickerhauser) |
| Energy conservation | Yes | Yes (with complete leaf set) |
| Main use | Denoising, MRA | Compression, classification, sparsity |
WPT is a strict superset of DWT: with sufficient \(J\) , the DWT decomposition is always a candidate inside the WPT tree. For signals where DWT is optimal, Best Basis recovers the DWT-style left-skewed tree. Hence WPT is always at least as efficient as DWT.
Summary
- Wavelet Packet Transform recursively splits both branches that DWT leaves undivided, yielding an orthogonal decomposition into \(2^J\) subspaces.
- Each complete leaf set defines an orthonormal basis, so the WPT tree is a dictionary of bases.
- The Coifman–Wickerhauser Best Basis algorithm minimizes an additive cost (e.g. Shannon entropy) in \(O(N \log N)\) and yields a signal-adaptive optimal basis.
- Combining
pywt.WaveletPacketwith a custom recursion supports Best Basis selection, tiling visualization, and sparsity evaluation in a unified pipeline. - Applications span signal compression, classification, speech/audio analysis, and communications—DWT is always available as a special case, so WPT is at least as efficient.
WPT with Best Basis is the natural extension of wavelet theory toward signal-adaptive time-frequency analysis, and the key to going beyond the limits of fixed tilings.
Related Articles
- Introduction to Wavelet Transform: Time-Frequency Analysis with Python - Covers the CWT/DWT, MRA, and filter banks that underlie this article.
- Short-Time Fourier Transform (STFT): Theory and Python Implementation - Fixed-window time-frequency analysis; the comparison target for adaptive WPT tilings.
- Fast Fourier Transform (FFT): Theory and Python Implementation - The starting point of spectral analysis inside each packet node.
- DTFT vs DFT vs FFT: Definitions, Relationships, and Python Implementation - Clarifies the frequency-representation hierarchy that connects to the spectral interpretation of packet coefficients.
- Window Functions and Power Spectral Density (PSD) - Underpins spectrum estimation inside each packet node.
- Hilbert Transform and Analytic Signal - Extracts envelope and instantaneous frequency per packet, complementing WPT.
- Time-Series Anomaly Detection - Best-Basis-derived features make a strong preprocessing step for anomaly detection.
- Time-Frequency Analysis Guide - Hub that places WPT next to FFT, STFT, CWT, and the Hilbert transform, organized by tile shape and signal adaptivity.
References
- Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718.
- Mallat, S. (2008). A Wavelet Tour of Signal Processing: The Sparse Way (3rd ed.). Academic Press.
- Wickerhauser, M. V. (1994). Adapted Wavelet Analysis from Theory to Software. A K Peters.
- PyWavelets documentation: WaveletPacket