Simulated Annealing: Theory and Python Implementation

Covers the temperature schedule and transition probability of Simulated Annealing, with Python implementations for continuous optimization and TSP.

Introduction

Simulated Annealing (SA) is a metaheuristic optimization technique inspired by the annealing process in metallurgy. Just as slow cooling allows a metal’s crystal structure to reach a minimum energy state, SA searches for the global optimum of an objective function.

The key feature of SA is probabilistic acceptance of worsening solutions, enabling escape from local optima. Early iterations perform bold exploration at high temperature, transitioning to fine-grained local search as temperature decreases.

Algorithm

Metropolis Criterion

The core of SA is the Metropolis criterion. Transition from current solution \(x\) to neighbor \(x'\) is accepted with probability:

\[P(\text{accept}) = \begin{cases} 1 & \text{if } \Delta E \leq 0 \\ \exp\left(-\frac{\Delta E}{T}\right) & \text{if } \Delta E > 0 \end{cases} \tag{1}\]

where \(\Delta E = f(x') - f(x)\) and \(T\) is the current temperature.

Boltzmann Distribution Connection

With sufficient exploration at temperature \(T\), the solution distribution converges to the Boltzmann distribution:

\[P(x) \propto \exp\left(-\frac{f(x)}{T}\right) \tag{2}\]

As \(T \to 0\), the distribution concentrates on the optimal solution.

Temperature Schedules

Geometric Cooling (Most Common)

\[T_{k+1} = \alpha \cdot T_k, \quad 0.9 \leq \alpha < 1 \tag{3}\]

\(\alpha = 0.95\) is a typical value.

Linear Cooling

\[T_{k+1} = T_k - \delta \tag{4}\]

Logarithmic Cooling (Theoretical Guarantee)

\[T_k = \frac{T_0}{\ln(k + 1)} \tag{5}\]

Theoretically guarantees convergence to the global optimum, but convergence is extremely slow and impractical.

Python Implementation: Continuous Optimization

Optimizing the Rastrigin function (multimodal benchmark) with SA:

import numpy as np
import matplotlib.pyplot as plt

def rastrigin(x):
    """Rastrigin function"""
    return 10 * len(x) + np.sum(x**2 - 10 * np.cos(2 * np.pi * x))

def simulated_annealing(objective, dim, bounds=(-5.12, 5.12),
                        T_init=100.0, alpha=0.995, max_iter=10000):
    """Simulated Annealing for continuous optimization"""
    x = np.random.uniform(*bounds, size=dim)
    best_x = x.copy()
    best_f = objective(x)
    current_f = best_f

    T = T_init
    history = [best_f]

    for i in range(max_iter):
        sigma = 0.5 * (T / T_init)
        x_new = x + np.random.normal(0, sigma, size=dim)
        x_new = np.clip(x_new, *bounds)

        new_f = objective(x_new)
        delta = new_f - current_f

        if delta <= 0 or np.random.random() < np.exp(-delta / T):
            x = x_new
            current_f = new_f
            if current_f < best_f:
                best_x = x.copy()
                best_f = current_f

        T *= alpha
        history.append(best_f)

    return best_x, best_f, history

# --- Run ---
np.random.seed(42)
best_x, best_f, history = simulated_annealing(rastrigin, dim=10)
print(f"Best fitness: {best_f:.6f}")

# --- Convergence plot ---
plt.figure(figsize=(10, 5))
plt.plot(history)
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('Simulated Annealing on Rastrigin Function (10D)')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Python Implementation: Traveling Salesman Problem (TSP)

SA is effective for combinatorial optimization. Here we solve TSP using 2-opt neighborhood:

import numpy as np
import matplotlib.pyplot as plt

def total_distance(tour, cities):
    """Compute total tour distance"""
    n = len(tour)
    dist = 0
    for i in range(n):
        dist += np.linalg.norm(cities[tour[i]] - cities[tour[(i + 1) % n]])
    return dist

def two_opt_swap(tour, i, j):
    """2-opt swap: reverse tour[i:j+1]"""
    new_tour = tour.copy()
    new_tour[i:j+1] = tour[i:j+1][::-1]
    return new_tour

def sa_tsp(cities, T_init=1000.0, alpha=0.9995, max_iter=100000):
    """Simulated Annealing for TSP"""
    n = len(cities)
    tour = list(range(n))
    np.random.shuffle(tour)

    current_dist = total_distance(tour, cities)
    best_tour = tour.copy()
    best_dist = current_dist
    T = T_init
    history = [best_dist]

    for _ in range(max_iter):
        i, j = sorted(np.random.choice(n, 2, replace=False))
        new_tour = two_opt_swap(tour, i, j)
        new_dist = total_distance(new_tour, cities)

        delta = new_dist - current_dist
        if delta <= 0 or np.random.random() < np.exp(-delta / T):
            tour = new_tour
            current_dist = new_dist
            if current_dist < best_dist:
                best_tour = tour.copy()
                best_dist = current_dist

        T *= alpha
        history.append(best_dist)

    return best_tour, best_dist, history

# --- Run ---
np.random.seed(42)
n_cities = 30
cities = np.random.rand(n_cities, 2) * 100

best_tour, best_dist, history = sa_tsp(cities)
print(f"Best distance: {best_dist:.2f}")

# --- Visualization ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

tour_cities = cities[best_tour + [best_tour[0]]]
axes[0].plot(tour_cities[:, 0], tour_cities[:, 1], 'b-o', markersize=5)
axes[0].set_title(f'Best Tour (distance={best_dist:.2f})')
axes[0].set_aspect('equal')
axes[0].grid(True, alpha=0.3)

axes[1].plot(history)
axes[1].set_xlabel('Iteration')
axes[1].set_ylabel('Best Distance')
axes[1].set_title('SA Convergence on TSP')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Comparison with Other Methods

PropertySAGACEM
SolutionsSinglePopulation-basedPopulation-based
Search mechanismProbabilistic acceptanceCrossover/MutationDistribution update
Discrete problemsStrongStrongWeak
ParametersTemperature scheduleCrossover/Mutation ratesElite ratio
ParallelizationDifficultEasyEasy

References

  • Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). “Optimization by Simulated Annealing”. Science, 220(4598), 671-680.
  • Metropolis, N., et al. (1953). “Equation of State Calculations by Fast Computing Machines”. The Journal of Chemical Physics, 21(6), 1087-1092.
  • Bertsimas, D., & Tsitsiklis, J. (1993). “Simulated annealing”. Statistical Science, 8(1), 10-15.