What is MPPI?
In the article on Cross-Entropy Method (CEM), we introduced a method for iteratively optimizing the parameters of a sampling distribution in importance sampling.
Model Predictive Path Integral (MPPI) is a sampling-based model predictive control (MPC) algorithm grounded in stochastic optimal control theory. Although CEM and MPPI appear to take different approaches, they can be understood through a unified mathematical framework based on importance sampling and variational inference.
- CEM: “Hard” selection of elite samples (top \(P\)%)
- MPPI: “Soft” exponential weighting based on cost
Reference: Williams, G., et al. (2017). “Information Theoretic MPC for Model-Based Reinforcement Learning.” ICRA 2017.
Problem Setup
Consider a discrete-time stochastic dynamical system:
\[ \mathbf{x}_{t+1} = F(\mathbf{x}_t, \mathbf{u}_t) + \boldsymbol{\epsilon}_t, \quad \boldsymbol{\epsilon}_t \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) \tag{1} \]where \(\mathbf{x}_t\) is the state, \(\mathbf{u}_t\) is the control input, and \(\boldsymbol{\epsilon}_t\) is the system noise.
The cost function for a control sequence \(\mathbf{U} = (\mathbf{u}_0, \mathbf{u}_1, \ldots, \mathbf{u}_{T-1})\) is defined as:
\[ J(\mathbf{U}) = \phi(\mathbf{x}_T) + \sum_{t=0}^{T-1} q(\mathbf{x}_t, \mathbf{u}_t) \tag{2} \]where \(\phi\) is the terminal cost and \(q\) is the stage cost.
Derivation of MPPI
Optimal Control Distribution
In the stochastic optimal control framework, we treat the control sequence as a random variable and define the optimal control distribution as:
\[ p^*(\mathbf{V}) \propto \exp\left(-\frac{1}{\lambda} J(\mathbf{V})\right) \cdot p(\mathbf{V}) \tag{3} \]where \(\mathbf{V}\) is the noisy control sequence, \(\lambda > 0\) is the temperature parameter, and \(p(\mathbf{V})\) is the prior distribution (current sampling distribution).
The role of temperature \(\lambda\):
- \(\lambda \to 0\): Selects only the minimum-cost control sequence (hard selection)
- \(\lambda \to \infty\): Weights all control sequences equally
Weighted Control Update
We draw \(N\) samples \(\mathbf{V}^{(1)}, \ldots, \mathbf{V}^{(N)}\) from the prior \(p(\mathbf{V})\) and compute the cost \(J(\mathbf{V}^{(i)})\) for each.
The importance sampling weights are computed as:
\[ w^{(i)} = \frac{\exp\left(-\frac{1}{\lambda} J(\mathbf{V}^{(i)})\right)}{\sum_{j=1}^N \exp\left(-\frac{1}{\lambda} J(\mathbf{V}^{(j)})\right)} \tag{4} \]The optimal control input is obtained as a weighted average:
\[ \mathbf{u}_t^* = \sum_{i=1}^N w^{(i)} \mathbf{v}_t^{(i)} \tag{5} \]Comparison with CEM
Both CEM and MPPI share the structure of sampling control sequences from a current distribution and updating the distribution based on cost information.
CEM (Hard Selection)
CEM selects the top \(P\)% elite samples and updates the distribution parameters:
\[ \boldsymbol{\mu}\_{\text{new}} = \frac{1}{|\mathcal{E}|} \sum_{i \in \mathcal{E}} \mathbf{V}^{(i)} \tag{6} \]\[ \boldsymbol{\Sigma}\_{\text{new}} = \frac{1}{|\mathcal{E}|} \sum_{i \in \mathcal{E}} (\mathbf{V}^{(i)} - \boldsymbol{\mu}\_{\text{new}})(\mathbf{V}^{(i)} - \boldsymbol{\mu}\_{\text{new}})^T \tag{7} \]where \(\mathcal{E}\) is the set of elite samples.
MPPI (Soft Weighting)
MPPI uses all samples with exponential weights:
\[ \boldsymbol{\mu}\_{\text{new}} = \sum_{i=1}^N w^{(i)} \mathbf{V}^{(i)} \tag{8} \]Unified Understanding: Variational Inference MPC
Both methods can be unified under the variational inference framework:
\[ \min_q \, D(q(\mathbf{U}) \| p^*(\mathbf{U})) \tag{9} \]where \(D\) is a divergence measure.
| CEM | MPPI | |
|---|---|---|
| Sample usage | Top \(P\)% only | All samples |
| Weighting | Uniform (0 or 1) | Exponential (continuous) |
| Parameters | Elite ratio \(P\) | Temperature \(\lambda\) |
| Variance update | Yes | No (typically) |
| Mathematical interpretation | Hard threshold | Boltzmann distribution |
MPPI Algorithm
Input: Initial control sequence μ = (μ_0, ..., μ_{T-1}), num samples N, temperature λ, noise variance Σ
Repeat:
1. Sample N noisy control sequences:
ε_t^(i) ~ N(0, Σ), V^(i) = μ + ε^(i)
2. Compute cost for each sample:
J(V^(i)) = φ(x_T^(i)) + Σ_t q(x_t^(i), v_t^(i))
3. Compute weights:
w^(i) = exp(-J(V^(i))/λ) / Σ_j exp(-J(V^(j))/λ)
4. Update control input:
μ_t ← μ_t + Σ_i w^(i) ε_t^(i)
5. Apply first control input μ_0 and shift the sequence
Python Implementation
We implement MPPI for a 2D goal-reaching problem.
Problem Definition
import numpy as np
import matplotlib.pyplot as plt
# ---- Problem setup ----
n_state = 4 # state dimension [x, y, vx, vy]
n_ctrl = 2 # control dimension [ax, ay]
dt = 0.1 # time step
T = 20 # prediction horizon
N = 500 # number of samples
lam = 1.0 # temperature parameter
sigma = 0.5 # control noise standard deviation
goal = np.array([5.0, 5.0]) # goal position
def dynamics(x, u):
"""Linear dynamics model (2D point mass)"""
x_next = np.zeros(n_state)
x_next[0] = x[0] + x[2] * dt # px
x_next[1] = x[1] + x[3] * dt # py
x_next[2] = x[2] + u[0] * dt # vx
x_next[3] = x[3] + u[1] * dt # vy
return x_next
def stage_cost(x, u):
"""Stage cost"""
pos = x[:2]
dist = np.sum((pos - goal)**2)
ctrl = 0.01 * np.sum(u**2)
return dist + ctrl
def terminal_cost(x):
"""Terminal cost"""
pos = x[:2]
return 10.0 * np.sum((pos - goal)**2)
MPPI Controller
class MPPIController:
def __init__(self):
# Initialize control sequence to zeros
self.mu = np.zeros((T, n_ctrl))
def compute_control(self, x0):
"""Compute optimal control input using MPPI"""
# Step 1: Sample noisy control sequences
noise = np.random.randn(N, T, n_ctrl) * sigma
V = self.mu[np.newaxis, :, :] + noise # (N, T, n_ctrl)
# Step 2: Compute cost for each sample
costs = np.zeros(N)
for i in range(N):
x = x0.copy()
for t in range(T):
costs[i] += stage_cost(x, V[i, t])
x = dynamics(x, V[i, t])
costs[i] += terminal_cost(x)
# Step 3: Compute weights (Eq. 4)
# Subtract minimum cost for numerical stability
costs_shifted = costs - np.min(costs)
weights = np.exp(-costs_shifted / lam)
weights /= np.sum(weights) # normalize
# Step 4: Update control input (Eq. 5)
weighted_noise = np.sum(
weights[:, np.newaxis, np.newaxis] * noise, axis=0
)
self.mu += weighted_noise
# Optimal control input (first time step)
u_opt = self.mu[0].copy()
# Shift control sequence for next time step
self.mu = np.roll(self.mu, -1, axis=0)
self.mu[-1] = 0.0 # set last element to zero
return u_opt
Simulation
np.random.seed(42)
# Initial state
x = np.array([0.0, 0.0, 0.0, 0.0])
controller = MPPIController()
# Simulation
n_steps = 100
trajectory = [x.copy()]
controls = []
for step in range(n_steps):
u = controller.compute_control(x)
controls.append(u.copy())
x = dynamics(x, u)
trajectory.append(x.copy())
# Check goal
if np.linalg.norm(x[:2] - goal) < 0.1:
print(f"Goal reached at step {step + 1}")
break
trajectory = np.array(trajectory)
controls = np.array(controls)
# ---- Plot results ----
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Trajectory
axes[0].plot(trajectory[:, 0], trajectory[:, 1], "b-o",
markersize=3, label="MPPI trajectory")
axes[0].plot(*goal, "r*", markersize=15, label="Goal")
axes[0].plot(0, 0, "gs", markersize=10, label="Start")
axes[0].set_xlabel("x")
axes[0].set_ylabel("y")
axes[0].set_title("MPPI - Goal Reaching")
axes[0].legend()
axes[0].grid(True)
axes[0].set_aspect("equal")
# Control inputs
axes[1].plot(controls[:, 0], label="$a_x$")
axes[1].plot(controls[:, 1], label="$a_y$")
axes[1].set_xlabel("Step")
axes[1].set_ylabel("Control input")
axes[1].set_title("Control inputs over time")
axes[1].legend()
axes[1].grid(True)
plt.tight_layout()
plt.savefig("mppi_result.png", dpi=150)
plt.show()
Effect of the Temperature Parameter
The temperature \(\lambda\) controls the concentration of the weights:
# Visualize the effect of temperature parameter
costs_example = np.linspace(0, 10, 100)
fig, ax = plt.subplots(figsize=(8, 5))
for lam_val in [0.1, 0.5, 1.0, 5.0]:
w = np.exp(-costs_example / lam_val)
w /= np.sum(w)
ax.plot(costs_example, w, label=f"$\\lambda={lam_val}$")
ax.set_xlabel("Cost $J$")
ax.set_ylabel("Weight $w$")
ax.set_title("Temperature parameter effect on MPPI weights")
ax.legend()
ax.grid(True)
plt.tight_layout()
plt.savefig("mppi_temperature.png", dpi=150)
plt.show()
A smaller \(\lambda\) concentrates weights on low-cost samples (approaching CEM’s hard selection), while a larger \(\lambda\) leads to more uniform weighting.
References
- Williams, G., Aldrich, A., & Theodorou, E. A. (2017). “Model Predictive Path Integral Control: From Theory to Parallel Computation.” Journal of Guidance, Control, and Dynamics, 40(2), 344-357.
- Williams, G., et al. (2017). “Information Theoretic MPC for Model-Based Reinforcement Learning.” ICRA 2017.
- Rubinstein, R. Y., & Kroese, D. P. (2013). The Cross-Entropy Method. Springer.