Wavelet Packet Transform and Best Basis Selection: High-Resolution Time-Frequency Analysis with Python Implementation

Wavelet packet transform basics, design, Python implementation, and comparison with DWT. Covers best basis selection via Coifman-Wickerhauser entropy and full binary tree decomposition implemented with PyWavelets.

Introduction

As covered in Introduction to Wavelet Transform, the Discrete Wavelet Transform (DWT) recursively splits a signal into “approximation (low-frequency)” and “detail (high-frequency)” components. However, the standard DWT algorithm (Mallat decomposition) has a fundamental limitation:

DWT recursively decomposes only the approximation (low-frequency) branch, leaving the detail (high-frequency) branch undivided beyond a single split. As a result, the frequency resolution in high-frequency bands is coarse. For example, with a 4-level decomposition of a signal sampled at 8 kHz, the finest-detail coefficients (cD1) cover the entire 2–4 kHz band, and frequency structure within that wide band cannot be captured.

The Wavelet Packet Transform (WPT) lifts this restriction and recursively splits the entire band, including the detail side, yielding a flexible high-resolution decomposition of the time-frequency plane. Combined with a cost function (typically an entropy), the Best Basis algorithm of Coifman and Wickerhauser automatically selects the optimal subtree—a signal-adaptive time-frequency tiling.

This article covers the mathematical structure of wavelet packets, the Coifman–Wickerhauser Best Basis algorithm, and a Python implementation using PyWavelets.

Wavelet Packet Basics

Full Binary Tree Decomposition

DWT is a “left-skewed tree” (only the low-pass branch is recursively decomposed), whereas the wavelet packet transform forms a complete binary tree. Every node is split into two children by the low-pass filter \(h\) and the high-pass filter \(g\) .

Let \(d_{j,n}[k]\) denote the packet coefficients at node \((j, n)\) on level \(j\) . Then:

\[d_{j+1, 2n}[k] = \sum_{l} h[l - 2k] \, d_{j, n}[l] \tag{1}\] \[d_{j+1, 2n+1}[k] = \sum_{l} g[l - 2k] \, d_{j, n}[l] \tag{2}\]

Here \(n\) is the subband index within the level (\(0 \le n < 2^j\) ), and \(h\) , \(g\) are the scaling and wavelet filters. The root \((0, 0)\) corresponds to the input signal \(x[k]\) .

Wavelet Packet Basis Functions

In continuous time, the wavelet packet basis \(\psi_{n,j,k}(t)\) is generated by the two-scale equations:

\[\psi_{2n}(t) = \sqrt{2} \sum_k h[k] \, \psi_n(2t - k) \tag{3}\] \[\psi_{2n+1}(t) = \sqrt{2} \sum_k g[k] \, \psi_n(2t - k) \tag{4}\]

with \(\psi_0 = \phi\) (scaling function) and \(\psi_1 = \psi\) (mother wavelet). Adding shift and dyadic scaling gives \(\psi_{n,j,k}(t) = 2^{-j/2}\psi_n(2^{-j} t - k)\) , the basis function at level \(j\) , subband \(n\) .

The relation to DWT is simple: DWT corresponds to the special subtree that decomposes only \(n = 0\) at every level. WPT removes this restriction and treats the full binary tree where every node is split.

Complete Decomposition and Redundancy

Hierarchy of Subspaces

When fully expanded to depth \(J\) , the space \(L^2(\mathbb{R})\) decomposes into \(2^J\) orthogonal subspaces \(W_{J,n}\) (\(0 \le n < 2^J\) ):

\[L^2(\mathbb{R}) = \bigoplus_{n=0}^{2^J - 1} W_{J,n} \tag{5}\]

Each \(W_{J,n}\) corresponds to a uniform partition of the frequency axis, with coefficients \(d_{J,n}\) coming from the level-\(J\) filter bank output. At the same time, a nested orthogonal decomposition holds:

\[W_{j,n} = W_{j+1, 2n} \oplus W_{j+1, 2n+1} \tag{6}\]

That is, replacing the basis of a parent node by the bases of its two children is a valid orthogonal transformation. This is exactly what justifies the Best Basis algorithm below.

Energy Conservation (Parseval)

Because the wavelet packet transform is orthogonal, as long as the leaf set forms a complete decomposition (covering \([0, N)\) ), Parseval’s identity guarantees energy preservation:

\[\sum_{k} |x[k]|^2 = \sum_{(j, n) \in \mathcal{T}} \sum_{k} |d_{j,n}[k]|^2 \tag{7}\]

where \(\mathcal{T}\) is the set of leaf nodes that form a complete decomposition. This conservation law is what makes entropy-type cost comparisons meaningful.

Redundancy and “Basis” Selection

The full tree contains roughly \((2J + 1) \cdot N\) coefficients—redundant relative to the \(N\) -sample input. However, selecting one set of leaves \(\mathcal{T}\) that forms a complete decomposition gives an orthonormal basis sufficient to represent the original signal. This is the heart of WPT.

In other words, the WPT tree provides a “dictionary of countless orthonormal bases”, and the next section explains how to select the one best matched to the signal.

Best Basis Selection

Cost Functions and Entropy

The “complexity” of representing a signal \(x\) in a given basis (leaf set \(\mathcal{T}\) ) is measured by an additive cost function \(\mathcal{M}(\mathcal{T})\) . Additive means that for every coefficient \(d_{j,n}[k]\) in \(\mathcal{T}\) ,

\[\mathcal{M}(\mathcal{T}) = \sum_{(j, n) \in \mathcal{T}} \sum_{k} \mu(|d_{j,n}[k]|^2) \tag{8}\]

A canonical choice is the Shannon entropy-type cost introduced by Coifman and Wickerhauser:

\[\mu(p) = -p \log p \quad (p > 0), \quad \mu(0) = 0 \tag{9}\]

With normalization \(p_k = |d_{j,n}[k]|^2 / \|x\|^2\) , \(\mathcal{M}\) becomes the Shannon information entropy of the coefficient distribution. Smaller values mean “energy concentrated in a few large coefficients”, i.e. sparser.

Other choices include \(\ell^p\) norms (\(p < 2\) ), log-energy \(\sum \log |d|^2\) , and the count of coefficients above a threshold. The shared requirement is additivity, so that subtrees can be compared locally.

Best Basis Algorithm

The Coifman–Wickerhauser Best Basis algorithm (1992) finds the complete decomposition \(\mathcal{T}\) minimizing \(\mathcal{M}(\mathcal{T})\) in \(O(N \log N)\) . It is a bottom-up dynamic program that, at each node, applies the recurrence:

\[\mathcal{M}_{\text{best}}(j, n) = \min \big\{ \mathcal{M}(d_{j, n}), \, \mathcal{M}_{\text{best}}(j+1, 2n) + \mathcal{M}_{\text{best}}(j+1, 2n+1) \big\} \tag{10}\]
  • First term: cost of treating the node as a leaf.
  • Second term: sum of optimal costs of the two children.

If the children’s total cost is smaller than the parent’s, the parent is split; otherwise it remains a leaf. Leaves return their own cost without comparison. A single bottom-up sweep determines the optimal basis, matching the \(O(N \log N)\) cost of computing the coefficients themselves.

Geometric Interpretation

Minimizing entropy is equivalent to selecting the decomposition in which the signal energy is concentrated in a few packet coefficients:

  • A pure sinusoid → concentrates in a frequency-localized narrowband packet → deep leaves with fine frequency resolution are chosen.
  • An impulse or transient → concentrates in time-localized shallow leaves.
  • A chirp or frequency-varying signal → intermediate leaves balancing time and frequency.

Best Basis thus goes beyond the fixed tiling of STFT and the left-skewed tiling of DWT: it adaptively chooses the shape of each time-frequency tile based on the signal itself.

Python Implementation

Full Decomposition with PyWavelets

pywt.WaveletPacket makes full decomposition and node access straightforward.

import numpy as np
import matplotlib.pyplot as plt
import pywt

# --- Test signal: two sinusoids + transient pulse ---
np.random.seed(0)
fs = 1024
N = 1024
t = np.arange(N) / fs

low = np.sin(2 * np.pi * 20 * t)
high = 0.6 * np.sin(2 * np.pi * 200 * t) * (t > 0.5)
pulse = np.zeros(N)
pulse[300:305] = 2.0
signal = low + high + pulse + 0.05 * np.random.randn(N)

# --- Full wavelet packet decomposition ---
wp = pywt.WaveletPacket(data=signal, wavelet="db4", mode="symmetric", maxlevel=5)

# Get all nodes at level 5 (natural order)
nodes_level5 = wp.get_level(5, order="natural")
print(f"Number of nodes at level 5: {len(nodes_level5)}")  # 2^5 = 32
print(f"Coefficient length per node: {len(nodes_level5[0].data)}")

WaveletPacket is lazily evaluated; you can access any node directly with a binary path like wp['aad'] (a = approximation, d = detail). get_level(j, order="natural") returns the \(2^j\) nodes in natural (filtering) order.

Entropy Cost Implementation

Shannon entropy as an additive cost function:

def shannon_entropy_cost(coeffs, eps=1e-12):
    """Shannon-entropy-type cost of a coefficient vector.

    p_k = |c_k|^2 / sum(|c|^2), H = -sum(p_k * log(p_k)).
    Additive under energy conservation.
    """
    energy = np.sum(np.abs(coeffs) ** 2)
    if energy < eps:
        return 0.0
    p = (np.abs(coeffs) ** 2) / energy
    p = p[p > eps]
    return float(-np.sum(p * np.log(p)))

Best Basis Algorithm

PyWavelets does not expose Best Basis directly, so we implement the recurrence in (10):

def best_basis(wp, cost_fn, level, current_path=""):
    """Coifman-Wickerhauser Best Basis search.

    Returns
    -------
    selected_paths : list[str]   selected leaf node paths
    total_cost     : float       optimal cost of the subtree
    """
    node = wp[current_path] if current_path else wp
    own_cost = cost_fn(node.data)

    # Leaf at max decomposition depth: return self
    if len(current_path) == level:
        return [current_path], own_cost

    # Recurse into children
    left_paths, left_cost = best_basis(wp, cost_fn, level, current_path + "a")
    right_paths, right_cost = best_basis(wp, cost_fn, level, current_path + "d")
    child_cost = left_cost + right_cost

    if child_cost < own_cost:
        return left_paths + right_paths, child_cost
    else:
        return [current_path], own_cost


selected, total_cost = best_basis(wp, shannon_entropy_cost, level=5)
print(f"Number of selected leaves: {len(selected)}")
print(f"Optimal cost (entropy): {total_cost:.4f}")
print("Sample selected paths:", selected[:8])

Compared with the full decomposition (\(2^5 = 32\) leaves), the selected set is typically smaller. Depending on the signal, the algorithm picks shallow nodes in time intervals with transient pulses and deep nodes in bands dominated by stationary sinusoids—a hybrid decomposition no fixed tiling can match.

Visualizing the Best Basis Tiling

Plot the selected leaves as rectangles on the time-frequency plane:

def plot_best_basis_tiling(selected_paths, max_level, fs, ax=None):
    """Visualize Best Basis leaves as time-frequency tiles."""
    if ax is None:
        _, ax = plt.subplots(figsize=(10, 5))

    for path in selected_paths:
        j = len(path)  # decomposition level = depth
        # Path -> subband index (natural order)
        n_natural = int(path.replace("a", "0").replace("d", "1"), 2) if path else 0
        # Natural -> frequency order (Gray-code-like reorder)
        n_freq = n_natural
        for shift in (1,):
            n_freq ^= n_natural >> shift

        f_low = n_freq * (fs / 2) / (2 ** j)
        f_high = (n_freq + 1) * (fs / 2) / (2 ** j)
        ax.add_patch(
            plt.Rectangle(
                (0, f_low),
                1.0,  # leaves cover the full time extent here
                f_high - f_low,
                fill=False,
                edgecolor="C0",
                linewidth=1.0,
            )
        )

    ax.set_xlim(0, 1)
    ax.set_ylim(0, fs / 2)
    ax.set_xlabel("Time (normalized)")
    ax.set_ylabel("Frequency [Hz]")
    ax.set_title("Best Basis Tiling (Wavelet Packet)")
    return ax


fig, ax = plt.subplots(figsize=(10, 5))
plot_best_basis_tiling(selected, max_level=5, fs=fs, ax=ax)
plt.tight_layout()
plt.show()

Tall tiles mean “sacrifice frequency resolution for time resolution”; wide tiles mean the opposite. Their shapes adapt to the signal, in contrast to the uniform tiles of STFT and the left-skewed tiles of DWT.

Full Decomposition vs Best Basis: Sparsity

Quantify how Best Basis improves sparsity:

def coefficient_magnitudes(wp, paths):
    """Concatenate magnitudes of coefficients from the given paths."""
    coeffs = []
    for p in paths:
        coeffs.append(np.abs(wp[p].data))
    return np.concatenate(coeffs)


full_paths = [n.path for n in wp.get_level(5, order="natural")]
mags_full = np.sort(coefficient_magnitudes(wp, full_paths))[::-1]
mags_best = np.sort(coefficient_magnitudes(wp, selected))[::-1]


def energy_concentration(mags, k_ratio=0.05):
    """Energy captured by the top k_ratio of coefficients."""
    k = max(1, int(len(mags) * k_ratio))
    total = np.sum(mags ** 2)
    top = np.sum(mags[:k] ** 2)
    return top / total


print(f"Full decomposition (level 5): top-5% energy = {energy_concentration(mags_full):.3f}")
print(f"Best Basis                  : top-5% energy = {energy_concentration(mags_best):.3f}")

Best Basis concentrates energy in a few coefficients, improving the sparsity metric used for compression and thresholding.

Applications

Wavelet packet transform with Best Basis selection is especially powerful in:

Signal Compression

Generalizes DWT-based compression (as in JPEG 2000) by adapting the basis to each signal. For acoustic signals with dominant narrowband structure, it can outperform DWT.

Non-stationary Signal Classification

EEG, EMG, and machine vibration signals exhibit class-specific time-frequency patterns. Selecting representative packets per class via Best Basis and using their coefficient statistics as features captures discriminative structure that fixed tilings miss. This is also applicable as feature extraction for Time-Series Anomaly Detection.

Speech and Audio Analysis

Speech contains narrowband formant resonances and transient plosives, which fixed-window STFT cannot resolve simultaneously. Best Basis captures both at high resolution, and is also used in music chord analysis and bioacoustic studies of dolphin and bat calls.

Communications and Radar Signal Detection

For unknown modulations or chirps, energy-concentrated leaves serve as features that reveal structure invisible to FFT or Window Functions and PSD.

Comparison: DWT vs WPT + Best Basis

FeatureDWTWPT + Best Basis
Decomposition treeLeft-skewed (only approx branch)Full binary tree (recurses everywhere)
High-freq resolutionCoarse (finest cD is wideband)Fine (split to any depth)
BasisFixedSignal-adaptive (cost minimization)
Number of bases1~\(2^{2^{J-1}}\) complete decompositions
Decomposition cost\(O(N)\)\(O(N \log N)\)
Basis-search costN/A\(O(N \log N)\) (Coifman–Wickerhauser)
Energy conservationYesYes (with complete leaf set)
Main useDenoising, MRACompression, classification, sparsity

WPT is a strict superset of DWT: with sufficient \(J\) , the DWT decomposition is always a candidate inside the WPT tree. For signals where DWT is optimal, Best Basis recovers the DWT-style left-skewed tree. Hence WPT is always at least as efficient as DWT.

Summary

  • Wavelet Packet Transform recursively splits both branches that DWT leaves undivided, yielding an orthogonal decomposition into \(2^J\) subspaces.
  • Each complete leaf set defines an orthonormal basis, so the WPT tree is a dictionary of bases.
  • The Coifman–Wickerhauser Best Basis algorithm minimizes an additive cost (e.g. Shannon entropy) in \(O(N \log N)\) and yields a signal-adaptive optimal basis.
  • Combining pywt.WaveletPacket with a custom recursion supports Best Basis selection, tiling visualization, and sparsity evaluation in a unified pipeline.
  • Applications span signal compression, classification, speech/audio analysis, and communications—DWT is always available as a special case, so WPT is at least as efficient.

WPT with Best Basis is the natural extension of wavelet theory toward signal-adaptive time-frequency analysis, and the key to going beyond the limits of fixed tilings.

References

  • Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718.
  • Mallat, S. (2008). A Wavelet Tour of Signal Processing: The Sparse Way (3rd ed.). Academic Press.
  • Wickerhauser, M. V. (1994). Adapted Wavelet Analysis from Theory to Software. A K Peters.
  • PyWavelets documentation: WaveletPacket