Introduction
Simulated Annealing (SA) is a metaheuristic optimization technique inspired by the annealing process in metallurgy. Just as slow cooling allows a metal’s crystal structure to reach a minimum energy state, SA searches for the global optimum of an objective function.
The key feature of SA is probabilistic acceptance of worsening solutions, enabling escape from local optima. Early iterations perform bold exploration at high temperature, transitioning to fine-grained local search as temperature decreases.
Algorithm
Metropolis Criterion
The core of SA is the Metropolis criterion. Transition from current solution \(x\) to neighbor \(x'\) is accepted with probability:
\[P(\text{accept}) = \begin{cases} 1 & \text{if } \Delta E \leq 0 \\ \exp\left(-\frac{\Delta E}{T}\right) & \text{if } \Delta E > 0 \end{cases} \tag{1}\]where \(\Delta E = f(x') - f(x)\) and \(T\) is the current temperature.
Boltzmann Distribution Connection
With sufficient exploration at temperature \(T\), the solution distribution converges to the Boltzmann distribution:
\[P(x) \propto \exp\left(-\frac{f(x)}{T}\right) \tag{2}\]As \(T \to 0\), the distribution concentrates on the optimal solution.
Temperature Schedules
Geometric Cooling (Most Common)
\[T_{k+1} = \alpha \cdot T_k, \quad 0.9 \leq \alpha < 1 \tag{3}\]\(\alpha = 0.95\) is a typical value.
Linear Cooling
\[T_{k+1} = T_k - \delta \tag{4}\]Logarithmic Cooling (Theoretical Guarantee)
\[T_k = \frac{T_0}{\ln(k + 1)} \tag{5}\]Theoretically guarantees convergence to the global optimum, but convergence is extremely slow and impractical.
Python Implementation: Continuous Optimization
Optimizing the Rastrigin function (multimodal benchmark) with SA:
import numpy as np
import matplotlib.pyplot as plt
def rastrigin(x):
"""Rastrigin function"""
return 10 * len(x) + np.sum(x**2 - 10 * np.cos(2 * np.pi * x))
def simulated_annealing(objective, dim, bounds=(-5.12, 5.12),
T_init=100.0, alpha=0.995, max_iter=10000):
"""Simulated Annealing for continuous optimization"""
x = np.random.uniform(*bounds, size=dim)
best_x = x.copy()
best_f = objective(x)
current_f = best_f
T = T_init
history = [best_f]
for i in range(max_iter):
sigma = 0.5 * (T / T_init)
x_new = x + np.random.normal(0, sigma, size=dim)
x_new = np.clip(x_new, *bounds)
new_f = objective(x_new)
delta = new_f - current_f
if delta <= 0 or np.random.random() < np.exp(-delta / T):
x = x_new
current_f = new_f
if current_f < best_f:
best_x = x.copy()
best_f = current_f
T *= alpha
history.append(best_f)
return best_x, best_f, history
# --- Run ---
np.random.seed(42)
best_x, best_f, history = simulated_annealing(rastrigin, dim=10)
print(f"Best fitness: {best_f:.6f}")
# --- Convergence plot ---
plt.figure(figsize=(10, 5))
plt.plot(history)
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('Simulated Annealing on Rastrigin Function (10D)')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Python Implementation: Traveling Salesman Problem (TSP)
SA is effective for combinatorial optimization. Here we solve TSP using 2-opt neighborhood:
import numpy as np
import matplotlib.pyplot as plt
def total_distance(tour, cities):
"""Compute total tour distance"""
n = len(tour)
dist = 0
for i in range(n):
dist += np.linalg.norm(cities[tour[i]] - cities[tour[(i + 1) % n]])
return dist
def two_opt_swap(tour, i, j):
"""2-opt swap: reverse tour[i:j+1]"""
new_tour = tour.copy()
new_tour[i:j+1] = tour[i:j+1][::-1]
return new_tour
def sa_tsp(cities, T_init=1000.0, alpha=0.9995, max_iter=100000):
"""Simulated Annealing for TSP"""
n = len(cities)
tour = list(range(n))
np.random.shuffle(tour)
current_dist = total_distance(tour, cities)
best_tour = tour.copy()
best_dist = current_dist
T = T_init
history = [best_dist]
for _ in range(max_iter):
i, j = sorted(np.random.choice(n, 2, replace=False))
new_tour = two_opt_swap(tour, i, j)
new_dist = total_distance(new_tour, cities)
delta = new_dist - current_dist
if delta <= 0 or np.random.random() < np.exp(-delta / T):
tour = new_tour
current_dist = new_dist
if current_dist < best_dist:
best_tour = tour.copy()
best_dist = current_dist
T *= alpha
history.append(best_dist)
return best_tour, best_dist, history
# --- Run ---
np.random.seed(42)
n_cities = 30
cities = np.random.rand(n_cities, 2) * 100
best_tour, best_dist, history = sa_tsp(cities)
print(f"Best distance: {best_dist:.2f}")
# --- Visualization ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
tour_cities = cities[best_tour + [best_tour[0]]]
axes[0].plot(tour_cities[:, 0], tour_cities[:, 1], 'b-o', markersize=5)
axes[0].set_title(f'Best Tour (distance={best_dist:.2f})')
axes[0].set_aspect('equal')
axes[0].grid(True, alpha=0.3)
axes[1].plot(history)
axes[1].set_xlabel('Iteration')
axes[1].set_ylabel('Best Distance')
axes[1].set_title('SA Convergence on TSP')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Comparison with Other Methods
| Property | SA | GA | CEM |
|---|---|---|---|
| Solutions | Single | Population-based | Population-based |
| Search mechanism | Probabilistic acceptance | Crossover/Mutation | Distribution update |
| Discrete problems | Strong | Strong | Weak |
| Parameters | Temperature schedule | Crossover/Mutation rates | Elite ratio |
| Parallelization | Difficult | Easy | Easy |
Related Articles
- Cross-Entropy Method: A Practical Monte Carlo Optimization Technique - Useful comparison with CEM’s distribution-update approach.
- Particle Swarm Optimization (PSO) Python Implementation - Swarm intelligence-based optimization for comparison.
- Bayesian Optimization: Fundamentals and Python Implementation - Efficient optimization for expensive evaluations.
- MPPI (Model Predictive Path Integral) Mathematics - Sampling-based optimal control.
References
- Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). “Optimization by Simulated Annealing”. Science, 220(4598), 671-680.
- Metropolis, N., et al. (1953). “Equation of State Calculations by Fast Computing Machines”. The Journal of Chemical Physics, 21(6), 1087-1092.
- Bertsimas, D., & Tsitsiklis, J. (1993). “Simulated annealing”. Statistical Science, 8(1), 10-15.