The Cross-Entropy Method (CEM) is a type of importance sampling in Monte Carlo methods, used as an algorithm for optimization problems and rare event probability estimation.
Monte Carlo Sampling
The Monte Carlo method is a general term for numerical computation techniques that use random numbers to find approximate solutions to complex problems.
For example, suppose we want to find the expected value \(l = \mathbb{E}[H(X)] = \int H(x)f(x)dx\) of a function \(H(X)\), where random variable \(X\) follows probability density function \(f(x)\). In Monte Carlo sampling, this expected value is approximated using \(N\) independent and identically distributed samples \(X_1, \dots, X_N\):
\[ l*{MS} = \frac{1}{N} \sum*{i=1}^{N} H(X_i) \]Importance Sampling
In Monte Carlo sampling, samples are generated directly from the probability distribution \(f(x)\) for which we want to compute the expected value. However, if the region where \(H(x)\) takes large values falls in a low-probability region of \(f(x)\), efficient sampling becomes difficult.
In importance sampling, instead of sampling from the target distribution \(f(x)\), samples are generated from a different sampling distribution \(g(x)\). The expected value is then estimated by multiplying those samples by weights \(w(x) = f(x)/g(x)\).
\[ l*{IS} = \frac{1}{N} \sum*{i=1}^N H(X_i) \frac{f(X_i)}{g(X_i)} = \mathbb{E}\_g\lbrace H(X)\frac{f(X)}{g(X)}\rbrace \]Here, \(X_i \sim g(x)\). The efficiency of importance sampling depends heavily on the choice of sampling distribution \(g(x)\). In particular, the optimal sampling distribution \(g^*(x)\) that minimizes variance is known to take the form \(g^*(x) \propto |H(x)|f(x)\).
Cross-Entropy Method (CEM)
The Cross-Entropy Method is an algorithm for iteratively finding a distribution close to the optimal sampling distribution \(g^*(x)\) in importance sampling.
“Cross-entropy” is a measure of similarity between two probability distributions \(p\) and \(q\), defined as:
\[ H(p, q) = -\int p(x) \log q(x) dx \]From the relationship \(KL(p||q) = \int p(x) \log \frac{p(x)}{q(x)} dx = H(p,q) - H(p)\), when the true distribution \(p\) is fixed, minimizing the KL divergence is equivalent to minimizing the cross-entropy.
CEM optimizes the parameters \(v\) of a parameterized sampling distribution \(g(x;v)\) through the following procedure:
- Initialization: Initialize parameter \(v\) and set up the sampling distribution \(g(x;v)\).
- Sampling: Generate \(N\) samples \(X_1, \dots, X_N\) from the current sampling distribution \(g(x;v)\).
- Evaluation: Compute the objective function \(H(X_i)\) for each sample \(X_i\).
- Elite Sample Selection: Select the top \(P\) percent of samples with the highest (or lowest, depending on the optimization direction) objective function values as “elite samples.”
- Parameter Update: Update the parameters \(v\) of the sampling distribution \(g(x;v)\) using the elite samples. This update maximizes the likelihood of the elite samples, which is equivalent to minimizing the cross-entropy between the empirical distribution of elite samples and \(g(x;v)\).
- Convergence Check: Repeat from step 2 until the change in parameter \(v\) is sufficiently small or the maximum number of iterations is reached.
Through this iterative process, the sampling distribution \(g(x;v)\) gradually concentrates on the region of the optimal solution, enabling efficient search.
Related Articles
- Bayesian Optimization: Fundamentals and Python Implementation - Another black-box optimization approach that uses Gaussian process surrogate models instead of sampling-based search.
- MPPI (Model Predictive Path Integral): A Unified View with the Cross-Entropy Method - Extends CEM’s importance sampling framework to stochastic optimal control, with a unified mathematical perspective.
References
- Rubinstein, R. Y., & Kroese, D. P. (2013). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Springer Science & Business Media.
- Urushihara, T., “Application of Large Deviation Theory to Importance Sampling in Monte Carlo Simulation”