Simulated Annealing: Theory and Python Implementation

Introduction

Simulated Annealing (SA) is a metaheuristic optimization technique inspired by the annealing process in metallurgy. Just as slow cooling allows a metal’s crystal structure to reach a minimum energy state, SA searches for the global optimum of an objective function.

The key feature of SA is probabilistic acceptance of worsening solutions, enabling escape from local optima. Early iterations perform bold exploration at high temperature, transitioning to fine-grained local search as temperature decreases.

Algorithm

Metropolis Criterion

The core of SA is the Metropolis criterion. Transition from current solution \(x\) to neighbor \(x'\) is accepted with probability:

\[P(\text{accept}) = \begin{cases} 1 & \text{if } \Delta E \leq 0 \\ \exp\left(-\frac{\Delta E}{T}\right) & \text{if } \Delta E > 0 \end{cases} \tag{1}\]

where \(\Delta E = f(x') - f(x)\) and \(T\) is the current temperature.

Boltzmann Distribution Connection

With sufficient exploration at temperature \(T\), the solution distribution converges to the Boltzmann distribution:

\[P(x) \propto \exp\left(-\frac{f(x)}{T}\right) \tag{2}\]

As \(T \to 0\), the distribution concentrates on the optimal solution.

Temperature Schedules

Geometric Cooling (Most Common)

\[T_{k+1} = \alpha \cdot T_k, \quad 0.9 \leq \alpha < 1 \tag{3}\]

\(\alpha = 0.95\) is a typical value.

Linear Cooling

\[T_{k+1} = T_k - \delta \tag{4}\]

Logarithmic Cooling (Theoretical Guarantee)

\[T_k = \frac{T_0}{\ln(k + 1)} \tag{5}\]

Theoretically guarantees convergence to the global optimum, but convergence is extremely slow and impractical.

Python Implementation: Continuous Optimization

Optimizing the Rastrigin function (multimodal benchmark) with SA:

import numpy as np
import matplotlib.pyplot as plt

def rastrigin(x):
    """Rastrigin function"""
    return 10 * len(x) + np.sum(x**2 - 10 * np.cos(2 * np.pi * x))

def simulated_annealing(objective, dim, bounds=(-5.12, 5.12),
                        T_init=100.0, alpha=0.995, max_iter=10000):
    """Simulated Annealing for continuous optimization"""
    x = np.random.uniform(*bounds, size=dim)
    best_x = x.copy()
    best_f = objective(x)
    current_f = best_f

    T = T_init
    history = [best_f]

    for i in range(max_iter):
        sigma = 0.5 * (T / T_init)
        x_new = x + np.random.normal(0, sigma, size=dim)
        x_new = np.clip(x_new, *bounds)

        new_f = objective(x_new)
        delta = new_f - current_f

        if delta <= 0 or np.random.random() < np.exp(-delta / T):
            x = x_new
            current_f = new_f
            if current_f < best_f:
                best_x = x.copy()
                best_f = current_f

        T *= alpha
        history.append(best_f)

    return best_x, best_f, history

# --- Run ---
np.random.seed(42)
best_x, best_f, history = simulated_annealing(rastrigin, dim=10)
print(f"Best fitness: {best_f:.6f}")

# --- Convergence plot ---
plt.figure(figsize=(10, 5))
plt.plot(history)
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('Simulated Annealing on Rastrigin Function (10D)')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Python Implementation: Traveling Salesman Problem (TSP)

SA is effective for combinatorial optimization. Here we solve TSP using 2-opt neighborhood:

import numpy as np
import matplotlib.pyplot as plt

def total_distance(tour, cities):
    """Compute total tour distance"""
    n = len(tour)
    dist = 0
    for i in range(n):
        dist += np.linalg.norm(cities[tour[i]] - cities[tour[(i + 1) % n]])
    return dist

def two_opt_swap(tour, i, j):
    """2-opt swap: reverse tour[i:j+1]"""
    new_tour = tour.copy()
    new_tour[i:j+1] = tour[i:j+1][::-1]
    return new_tour

def sa_tsp(cities, T_init=1000.0, alpha=0.9995, max_iter=100000):
    """Simulated Annealing for TSP"""
    n = len(cities)
    tour = list(range(n))
    np.random.shuffle(tour)

    current_dist = total_distance(tour, cities)
    best_tour = tour.copy()
    best_dist = current_dist
    T = T_init
    history = [best_dist]

    for _ in range(max_iter):
        i, j = sorted(np.random.choice(n, 2, replace=False))
        new_tour = two_opt_swap(tour, i, j)
        new_dist = total_distance(new_tour, cities)

        delta = new_dist - current_dist
        if delta <= 0 or np.random.random() < np.exp(-delta / T):
            tour = new_tour
            current_dist = new_dist
            if current_dist < best_dist:
                best_tour = tour.copy()
                best_dist = current_dist

        T *= alpha
        history.append(best_dist)

    return best_tour, best_dist, history

# --- Run ---
np.random.seed(42)
n_cities = 30
cities = np.random.rand(n_cities, 2) * 100

best_tour, best_dist, history = sa_tsp(cities)
print(f"Best distance: {best_dist:.2f}")

# --- Visualization ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

tour_cities = cities[best_tour + [best_tour[0]]]
axes[0].plot(tour_cities[:, 0], tour_cities[:, 1], 'b-o', markersize=5)
axes[0].set_title(f'Best Tour (distance={best_dist:.2f})')
axes[0].set_aspect('equal')
axes[0].grid(True, alpha=0.3)

axes[1].plot(history)
axes[1].set_xlabel('Iteration')
axes[1].set_ylabel('Best Distance')
axes[1].set_title('SA Convergence on TSP')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Comparison with Other Methods

Property	SA	GA	CEM
Solutions	Single	Population-based	Population-based
Search mechanism	Probabilistic acceptance	Crossover/Mutation	Distribution update
Discrete problems	Strong	Strong	Weak
Parameters	Temperature schedule	Crossover/Mutation rates	Elite ratio
Parallelization	Difficult	Easy	Easy

Genetic Algorithms: Fundamentals and Python Implementation - Another metaheuristic based on evolutionary computation.
Bayesian Optimization: Fundamentals and Python Implementation - Efficient optimization for expensive evaluations.
Markov Chain Monte Carlo (MCMC): Metropolis-Hastings and Gibbs Sampling - MCMC sampling using the same Metropolis criterion as SA.
MPPI (Model Predictive Path Integral): A Unified View with the Cross-Entropy Method - Sampling-based optimal control.
Cross-Entropy Method: A Practical Monte Carlo Optimization Technique - Useful comparison with CEM’s distribution-update approach.
Particle Swarm Optimization (PSO) Python Implementation - Swarm intelligence-based optimization for comparison.

References

Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). “Optimization by Simulated Annealing”. Science, 220(4598), 671-680.
Metropolis, N., et al. (1953). “Equation of State Calculations by Fast Computing Machines”. The Journal of Chemical Physics, 21(6), 1087-1092.
Bertsimas, D., & Tsitsiklis, J. (1993). “Simulated annealing”. Statistical Science, 8(1), 10-15.