Monte Carlo Optimization (CEM/SA/GA/MPPI/PSO): Fundamentals, Comparison, and Python Implementation

Monte Carlo optimization (CEM/SA/GA/MPPI/PSO) fundamentals, comparison, and Python implementation organized as a hub. Common sample/evaluate/update skeleton, method-selection guide, and a numpy-only shared framework.

Introduction

Monte Carlo optimization is a family of methods that minimize (or maximize) an objective \(f(x)\) using random sampling rather than gradients \(\nabla f\) . They shine exactly where gradient methods fail: non-differentiable objectives, black-box simulators, and multi-modal landscapes.

Cross-Entropy Method (CEM), Simulated Annealing (SA), Genetic Algorithms (GA), MPPI, and Particle Swarm Optimization (PSO) are all canonical Monte Carlo optimizers built on the same “sample → evaluate → update” skeleton. This article positions each method side by side and organizes the shared framework, selection guidance, and Python implementation as a hub.

The Common Skeleton

Every Monte Carlo optimizer can be viewed as iterating three steps:

  1. Sample: draw \(N\) samples \(\{x^{(i)}\}_{i=1}^{N}\) from the current distribution \(p_t(x)\) or around the current solution \(x_t\)
  2. Evaluate: compute \(f(x^{(i)})\) for each sample
  3. Update: build the next distribution \(p_{t+1}\) or solution \(x_{t+1}\) from the evaluations

Symbolically the update is

\[ p_{t+1}(x) = \mathcal{U}\big(p_t,\, \{(x^{(i)}, f(x^{(i)}))\}_{i=1}^{N}\big) \tag{1} \]

where \(\mathcal{U}\) is the method-specific update operator. CEM fits an empirical distribution to elite quantiles, SA accepts moves with Boltzmann probability, GA applies selection / crossover / mutation, MPPI computes a cost-exponentially-weighted average, and PSO updates velocities with inertia, cognitive, and social terms. All are different choices of the same \(\mathcal{U}\) .

Bayesian optimization is also a sampling-based optimizer, but it operates through a surrogate model (Gaussian process) and an acquisition function — it emphasizes “where to sample next” over “how to update a distribution.” Particle filters are a sibling family that uses Monte Carlo sampling for state estimation, with resampling mechanics mathematically equivalent to CEM’s elite selection and MPPI’s reweighting.

Method Comparison Table

MethodSearch strategySolution formTypical useCostGradient
CEMDistribution update (elite quantile)Parametric distributionContinuous opt, RL policy searchMedium (parallel-friendly)Not needed
SAProbabilistic acceptance (temperature)Single solutionCombinatorial opt, TSPLow (sequential)Not needed
GAPopulation evolution (crossover/mutation)PopulationGlobal opt, design problemsHigh (many evals)Not needed
MPPIImportance sampling (exp weights)Distribution over trajectoriesModel predictive control, roboticsMedium–high (needs parallelism)Not needed
PSOInertia + cognitive + socialSwarm of particlesContinuous opt, swarm intelligenceMediumNot needed
Bayesian optimizationSurrogate + acquisitionGP posteriorExpensive black-box problemsHigh (surrogate update)Not needed (internal GP gradients)

All methods share three properties: gradient-free, easy to parallelize, and robust to multi-modality. They differ in which probabilistic model they maintain and how the update concentrates mass on good regions.

Shared Python Framework

Each method fits the same skeleton; only sample and update change.

import numpy as np

def monte_carlo_optimize(
    f, sampler, updater, init_state, n_iter=50, n_samples=100, seed=0
):
    """
    Generic Monte Carlo optimization loop.

    Parameters
    ----------
    f          : callable  objective f(x) to minimize
    sampler    : callable  state -> array of shape (n_samples, dim)
    updater    : callable  (state, samples, scores) -> new state
    init_state : initial state (per-method distribution params, population, ...)
    n_iter     : number of iterations
    n_samples  : samples per iteration
    """
    rng = np.random.default_rng(seed)
    state = init_state
    history = []
    for t in range(n_iter):
        samples = sampler(state, n_samples, rng)
        scores = np.array([f(x) for x in samples])
        state = updater(state, samples, scores)
        best = scores.min()
        history.append(best)
    return state, np.array(history)


# === Example 1: CEM-style update (fit Gaussian to elite quantile) ===
def cem_sampler(state, n, rng):
    mu, sigma = state
    return rng.normal(mu, sigma, size=(n, mu.shape[0]))


def cem_updater(state, samples, scores, elite_frac=0.2):
    mu, sigma = state
    k = max(1, int(len(scores) * elite_frac))
    elite_idx = np.argsort(scores)[:k]
    elite = samples[elite_idx]
    return elite.mean(axis=0), elite.std(axis=0) + 1e-6


# === Example 2: PSO-style velocity update ===
def make_pso(dim, n_particles, w=0.7, c1=1.5, c2=1.5):
    def sampler(state, n, rng):
        x, v, pbest, pbest_val, gbest = state
        return x  # PSO evaluates the state itself

    def updater(state, samples, scores):
        x, v, pbest, pbest_val, gbest = state
        # personal best
        improved = scores < pbest_val
        pbest = np.where(improved[:, None], x, pbest)
        pbest_val = np.where(improved, scores, pbest_val)
        # global best
        g_idx = pbest_val.argmin()
        gbest = pbest[g_idx]
        # velocity / position update
        rng = np.random.default_rng()
        r1, r2 = rng.random(x.shape), rng.random(x.shape)
        v = w * v + c1 * r1 * (pbest - x) + c2 * r2 * (gbest - x)
        x = x + v
        return x, v, pbest, pbest_val, gbest

    return sampler, updater


# === Objective: Rastrigin (multi-modal benchmark) ===
def rastrigin(x, A=10.0):
    n = x.shape[0]
    return A * n + np.sum(x ** 2 - A * np.cos(2 * np.pi * x))


# === Optimize with CEM ===
dim = 5
init_state = (np.zeros(dim), np.ones(dim) * 2.0)
state, hist = monte_carlo_optimize(
    rastrigin, cem_sampler, cem_updater, init_state,
    n_iter=40, n_samples=200
)
print(f"CEM best: {hist.min():.4f}")

This skeleton ports to all of CEM / SA / GA / MPPI / PSO. For SA, sampler becomes a random walk around the current solution and updater applies the Metropolis acceptance rule. For GA, sampler returns the current population and updater does selection + crossover + mutation. For MPPI, sampler rolls out noisy control sequences and updater averages them with cost-exponential weights to refine the nominal control.

This generalization is also theoretically meaningful: CEM and MPPI share the same root in importance sampling with an optimal proposal distribution, as discussed in detail in the MPPI article.

Deep-Dive Pointers per Method

CEM (Cross-Entropy Method)

Maximum-likelihood-fits the next distribution to the elite quantile, derived from KL divergence minimization. Extremely simple and widely used for RL policy search.

Read more: Cross-Entropy Method

SA (Simulated Annealing)

Anneals temperature \(T\) and probabilistically accepts uphill moves with Boltzmann probability \(\exp(-\Delta f / T)\) . A long-time standard for combinatorial problems and TSP.

Read more: Simulated Annealing

GA (Genetic Algorithms)

Evolves a population via selection, crossover, and mutation. Handles both discrete and continuous problems and is widely used in design, path planning, and feature selection.

Read more: Genetic Algorithms

MPPI (Model Predictive Path Integral)

Treats control sequences as a distribution and updates the nominal trajectory by cost-exponentially-weighted averaging. Effectively a continuous-time / continuous-control sibling of CEM, increasingly standard in real-time robotic MPC.

Read more: MPPI

PSO (Particle Swarm Optimization)

Updates particle velocities with inertia + personal-best attraction + global-best attraction. Intuitive to implement with few hyperparameters.

Read more: PSO

Bayesian Optimization

Builds a Gaussian-process surrogate and uses an acquisition function to choose the next evaluation point. Dominates the others when each evaluation is expensive.

Read more: Bayesian Optimization

Particle Filters (Sibling Method)

A state-estimation method rather than an optimizer, but its resampling mechanism is mathematically equivalent to CEM’s elite selection and MPPI’s reweighting — the original Monte Carlo family member.

Read more: Particle Filter Python Implementation

Selection Guide

Pragmatic guidance for choosing a method on a real problem:

SituationRecommendedReason
Evaluation is very expensive (minutes+)Bayesian optimizationOrders-of-magnitude better sample efficiency
Continuous opt, low–mid dim (up to tens)CEM, PSOSimple, fast convergence
Continuous opt, high dim (hundreds+)CEM, MPPIParametric distribution mitigates curse of dim
Combinatorial opt, TSP, schedulingSA, GADiscrete neighborhoods / individuals are natural
Real-time control, roboticsMPPIGPU-parallel sampling, 100Hz+ control loops
Strongly multi-modal, global optimum requiredGA, multimodal CEMPopulation maintains diversity
Some gradient information is availableHybrid with gradientMC for initialization → gradient method to refine

Quick Flowchart

How expensive is f(x)?
├─ Very expensive ──> Bayesian optimization
└─ Cheap–moderate
    Continuous or discrete?
    ├─ Discrete  ──> SA / GA
    └─ Continuous
        Real-time required?
        ├─ Yes ──> MPPI
        └─ No
            Need to exploit per-particle best history?
            ├─ Yes ──> PSO
            └─ No  ──> CEM

This is a first approximation. Hybrid strategies (e.g., coarse CEM followed by GA for diversity) are often effective in practice.

Connections to Signal Processing

Monte Carlo optimization touches signal processing in several ways:

  • Non-convex adaptive filtering: when standard LMS/RLS gets stuck in local minima, CEM or GA can warm-start them
  • Filter coefficient design under quantization: integer-coefficient FIR design uses SA / GA
  • Control sequence optimization in MPC for signals: MPPI is increasingly standard
  • State estimation: for nonlinear / non-Gaussian systems, particle filters outperform linear (Kalman) filters

In particular, MPPI naturally handles non-differentiable costs (collision avoidance, binary constraints) that classical LQR / MPC cannot, making it popular at the intersection of signal processing and control.

Summary

  • All Monte Carlo optimizers share the sample → evaluate → update skeleton
  • CEM / SA / GA / MPPI / PSO differ only in the update operator \(\mathcal{U}\)
  • A single Python skeleton with swappable sampler and updater implements all of them
  • Selection by evaluation cost → continuous/discrete → real-time → history use is a practical order
  • Bayesian optimization and particle filters also live in the same Monte Carlo family

See the related articles below for the full mathematics and implementations of each method.

References

  • Rubinstein, R. Y., & Kroese, D. P. (2004). The Cross-Entropy Method. Springer.
  • Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by Simulated Annealing. Science, 220(4598), 671–680.
  • Holland, J. H. (1992). Adaptation in Natural and Artificial Systems. MIT Press.
  • Williams, G., et al. (2017). Model Predictive Path Integral Control: From Theory to Parallel Computation. Journal of Guidance, Control, and Dynamics, 40(2).
  • Kennedy, J., & Eberhart, R. (1995). Particle Swarm Optimization. Proc. IEEE ICNN.
  • Shahriari, B., et al. (2016). Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE, 104(1).