Cross-Entropy Method: A Practical Monte Carlo Optimization Technique

An explanation of the Cross-Entropy Method (CEM) for optimization, covering Monte Carlo sampling, importance sampling, and iterative elite-based parameter updates.

The Cross-Entropy Method (CEM) is a type of importance sampling in Monte Carlo methods, used as an algorithm for optimization problems and rare event probability estimation.

Monte Carlo Sampling

The Monte Carlo method is a general term for numerical computation techniques that use random numbers to find approximate solutions to complex problems.

For example, suppose we want to find the expected value $l = \mathbb{E}[H(X)] = \int H(x)f(x)dx$ of a function $H(X)$, where random variable $X$ follows probability density function $f(x)$. In Monte Carlo sampling, this expected value is approximated using $N$ independent and identically distributed samples $X_1, \dots, X_N$:

$$ l_{MS} = \frac{1}{N} \sum_{i=1}^{N} H(X_i) $$

Importance Sampling

In Monte Carlo sampling, samples are generated directly from the probability distribution $f(x)$ for which we want to compute the expected value. However, if the region where $H(x)$ takes large values falls in a low-probability region of $f(x)$, efficient sampling becomes difficult.

In importance sampling, instead of sampling from the target distribution $f(x)$, samples are generated from a different sampling distribution $g(x)$. The expected value is then estimated by multiplying those samples by weights $w(x) = f(x)/g(x)$.

$$ l_{IS} = \frac{1}{N} \sum_{i=1}^N H(X_i) \frac{f(X_i)}{g(X_i)} = \mathbb{E}_g\lbrace H(X)\frac{f(X)}{g(X)}\rbrace $$

Here, $X_i \sim g(x)$. The efficiency of importance sampling depends heavily on the choice of sampling distribution $g(x)$. In particular, the optimal sampling distribution $g^(x)$ that minimizes variance is known to take the form $g^(x) \propto |H(x)|f(x)$.

Cross-Entropy Method (CEM)

The Cross-Entropy Method is an algorithm for iteratively finding a distribution close to the optimal sampling distribution $g^*(x)$ in importance sampling.

“Cross-entropy” is a measure of similarity between two probability distributions $p$ and $q$, defined as:

$$ H(p, q) = -\int p(x) \log q(x) dx $$

From the relationship $KL(p||q) = \int p(x) \log \frac{p(x)}{q(x)} dx = H(p,q) - H(p)$, when the true distribution $p$ is fixed, minimizing the KL divergence is equivalent to minimizing the cross-entropy.

CEM optimizes the parameters $v$ of a parameterized sampling distribution $g(x;v)$ through the following procedure:

  1. Initialization: Initialize parameter $v$ and set up the sampling distribution $g(x;v)$.
  2. Sampling: Generate $N$ samples $X_1, \dots, X_N$ from the current sampling distribution $g(x;v)$.
  3. Evaluation: Compute the objective function $H(X_i)$ for each sample $X_i$.
  4. Elite Sample Selection: Select the top $P$ percent of samples with the highest (or lowest, depending on the optimization direction) objective function values as “elite samples.”
  5. Parameter Update: Update the parameters $v$ of the sampling distribution $g(x;v)$ using the elite samples. This update maximizes the likelihood of the elite samples, which is equivalent to minimizing the cross-entropy between the empirical distribution of elite samples and $g(x;v)$.
  6. Convergence Check: Repeat from step 2 until the change in parameter $v$ is sufficiently small or the maximum number of iterations is reached.

Through this iterative process, the sampling distribution $g(x;v)$ gradually concentrates on the region of the optimal solution, enabling efficient search.

References

  • Rubinstein, R. Y., & Kroese, D. P. (2013). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Springer Science & Business Media.
  • Urushihara, T., “Application of Large Deviation Theory to Importance Sampling in Monte Carlo Simulation”