Introduction
Particle Swarm Optimization (PSO) is a metaheuristic optimization technique inspired by the collective behavior of bird flocks and fish schools. Proposed by Kennedy and Eberhart in 1995, it is a gradient-free black-box optimizer that has been widely used for continuous, multimodal problems.
The defining feature of PSO is that each particle maintains a position and a velocity, and is attracted both to its own best historical position (personal best, pbest) and to the best position discovered by the entire swarm (global best, gbest). The swarm collectively converges toward promising regions of the search space much faster than pure random search.
This article reviews the PSO update rules through the lens of the inertia weight \(w\) , cognitive coefficient \(c_1\) , and social coefficient \(c_2\) , and walks through a Python implementation benchmarked on the Sphere and Rastrigin functions with convergence plots.
Algorithm
Particle State
Let the swarm size be \(N\) and the search space dimension be \(D\) . Each particle \(i \in \{1, \dots, N\}\) keeps:
- position \(\mathbf{x}_i^t \in \mathbb{R}^D\)
- velocity \(\mathbf{v}_i^t \in \mathbb{R}^D\)
- personal best \(\mathbf{p}_i^t\) (best position particle \(i\) has visited so far)
The swarm shares a single global best position \(\mathbf{g}^t = \arg\min_{i} f(\mathbf{p}_i^t)\) .
Velocity Update
The velocity of each particle is updated by:
$$ \mathbf{v}_i^{t+1} = w , \mathbf{v}_i^{t}
- c_1 \mathbf{r}_1 \odot (\mathbf{p}_i^{t} - \mathbf{x}_i^{t})
- c_2 \mathbf{r}_2 \odot (\mathbf{g}^{t} - \mathbf{x}_i^{t}) \tag{1} $$
Here \(\mathbf{r}_1, \mathbf{r}_2 \sim U[0,1]^D\) are independent uniform random vectors and \(\odot\) is the element-wise product. The three terms are:
- Term 1 \(w \, \mathbf{v}_i^t\) : inertia term, controlling how much of the previous velocity is retained
- Term 2 \(c_1 \mathbf{r}_1 \odot (\mathbf{p}_i^t - \mathbf{x}_i^t)\) : cognitive term, pulling toward the particle’s own best (pbest)
- Term 3 \(c_2 \mathbf{r}_2 \odot (\mathbf{g}^t - \mathbf{x}_i^t)\) : social term, pulling toward the swarm’s best (gbest)
Position Update
Position is advanced linearly by the new velocity:
\[ \mathbf{x}_i^{t+1} = \mathbf{x}_i^{t} + \mathbf{v}_i^{t+1} \tag{2} \]Updating pbest and gbest
After each iteration, replace the personal best whenever the current position improves the objective:
\[ \mathbf{p}_i^{t+1} = \begin{cases} \mathbf{x}_i^{t+1} & \text{if } f(\mathbf{x}_i^{t+1}) < f(\mathbf{p}_i^{t}) \\ \mathbf{p}_i^{t} & \text{otherwise} \end{cases} \tag{3} \]The global best is then refreshed:
\[ \mathbf{g}^{t+1} = \arg\min_{i} f(\mathbf{p}_i^{t+1}) \tag{4} \]Hyperparameters
Inertia Weight \(w\)
\(w\) governs the trade-off between exploration and exploitation.
- Large \(w\) (e.g. \(w \approx 0.9\) ): particles strongly retain previous velocity, encouraging wide-area search
- Small \(w\) (e.g. \(w \approx 0.4\) ): velocity decays quickly, promoting fine-grained local refinement
In practice, linearly decreasing inertia weight (LDIW) is the de facto standard, exploring broadly early on and converging late:
\[ w_t = w_{\max} - (w_{\max} - w_{\min}) \cdot \frac{t}{T_{\max}} \tag{5} \]Typical values are \(w_{\max} = 0.9\) and \(w_{\min} = 0.4\) .
Cognitive Coefficient \(c_1\) and Social Coefficient \(c_2\)
\(c_1\) and \(c_2\) weight the pull toward the particle’s own experience and the swarm’s experience, respectively.
- \(c_1 \gg c_2\) : particles favor their own history, leading to more independent search
- \(c_1 \ll c_2\) : pull toward the global best dominates, often causing premature convergence
Clerc & Kennedy (2002) derived a constriction coefficient form yielding \(w = 0.7298\) and \(c_1 = c_2 = 2.05\) . A widely used default is \(w = 0.729, c_1 = c_2 = 1.4944\) .
Velocity Clipping
To prevent particles from diverging out of the search domain, each velocity component is typically clipped to \(|v_{i,d}| \leq v_{\max}\) , where \(v_{\max}\) is around 10–20% of the search range width.
Benchmark Functions
Sphere Function (Unimodal)
\[ f_{\text{sphere}}(\mathbf{x}) = \sum_{i=1}^{D} x_i^2 \tag{6} \]The global minimum is \(f(\mathbf{0}) = 0\) . Being unimodal, PSO converges easily.
Rastrigin Function (Multimodal)
\[ f_{\text{rastrigin}}(\mathbf{x}) = 10 D + \sum_{i=1}^{D} \left[ x_i^2 - 10 \cos(2\pi x_i) \right] \tag{7} \]The global minimum is again \(f(\mathbf{0}) = 0\) , but the landscape has many local minima. With the wrong inertia weight or coefficients, the swarm may stagnate at a local minimum.
Python Implementation
A minimal NumPy-only PSO with linearly decreasing inertia weight and velocity clipping:
import numpy as np
import matplotlib.pyplot as plt
def sphere(x):
"""Sphere function (unimodal)"""
return np.sum(x**2, axis=-1)
def rastrigin(x):
"""Rastrigin function (multimodal)"""
return 10 * x.shape[-1] + np.sum(x**2 - 10 * np.cos(2 * np.pi * x), axis=-1)
class PSO:
def __init__(self, dim, n_particles=40, bounds=(-5.12, 5.12),
w_max=0.9, w_min=0.4, c1=1.4944, c2=1.4944,
v_max_ratio=0.2):
self.dim = dim
self.n_particles = n_particles
self.bounds = bounds
self.w_max = w_max
self.w_min = w_min
self.c1 = c1
self.c2 = c2
# v_max is a fraction of the search range width
self.v_max = v_max_ratio * (bounds[1] - bounds[0])
def initialize(self):
low, high = self.bounds
x = np.random.uniform(low, high, (self.n_particles, self.dim))
v = np.random.uniform(-self.v_max, self.v_max,
(self.n_particles, self.dim))
return x, v
def run(self, objective, max_iter=200):
x, v = self.initialize()
f = objective(x)
# Initialize pbest / gbest
pbest = x.copy()
pbest_f = f.copy()
g_idx = int(np.argmin(pbest_f))
gbest = pbest[g_idx].copy()
gbest_f = float(pbest_f[g_idx])
history = [gbest_f]
for t in range(max_iter):
# Linearly decreasing inertia weight
w = self.w_max - (self.w_max - self.w_min) * (t / max_iter)
# Velocity update (Eq. 1)
r1 = np.random.random((self.n_particles, self.dim))
r2 = np.random.random((self.n_particles, self.dim))
v = (w * v
+ self.c1 * r1 * (pbest - x)
+ self.c2 * r2 * (gbest - x))
v = np.clip(v, -self.v_max, self.v_max)
# Position update (Eq. 2)
x = x + v
x = np.clip(x, *self.bounds)
# Evaluate and update pbest / gbest
f = objective(x)
improved = f < pbest_f
pbest[improved] = x[improved]
pbest_f[improved] = f[improved]
g_idx = int(np.argmin(pbest_f))
if pbest_f[g_idx] < gbest_f:
gbest = pbest[g_idx].copy()
gbest_f = float(pbest_f[g_idx])
history.append(gbest_f)
return gbest, gbest_f, history
# --- Run ---
np.random.seed(42)
pso_sphere = PSO(dim=10)
_, best_sphere, hist_sphere = pso_sphere.run(sphere, max_iter=200)
np.random.seed(42)
pso_rastrigin = PSO(dim=10)
_, best_rastrigin, hist_rastrigin = pso_rastrigin.run(rastrigin, max_iter=200)
print(f"Sphere best fitness: {best_sphere:.6e}")
print(f"Rastrigin best fitness: {best_rastrigin:.6e}")
# --- Convergence plot ---
plt.figure(figsize=(10, 5))
plt.plot(hist_sphere, label='Sphere (10D)')
plt.plot(hist_rastrigin, label='Rastrigin (10D)')
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('PSO Convergence on Sphere and Rastrigin')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
On Sphere, PSO drives the objective down to roughly \(10^{-10}\) within tens of iterations. On Rastrigin, multimodality often causes early stagnation, and the choice of inertia weight, swarm size, and coefficients has a large effect on the final solution.
Effect of Inertia Weight on Convergence
Comparing different fixed \(w\) values is straightforward:
import numpy as np
import matplotlib.pyplot as plt
results = {}
for w in [0.3, 0.5, 0.7, 0.9]:
np.random.seed(0)
pso = PSO(dim=10, w_max=w, w_min=w) # fixed inertia weight
_, _, history = pso.run(rastrigin, max_iter=200)
results[w] = history
plt.figure(figsize=(10, 5))
for w, history in results.items():
plt.plot(history, label=f'w = {w}')
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('Effect of Inertia Weight on Rastrigin (10D)')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
Typically, \(w = 0.3\) produces weak global exploration and is prone to local minima, while \(w = 0.9\) converges slowly. A dynamic schedule like LDIW is the most robust default.
Comparison with Other Optimizers
| Method | Population-based | Main mechanism | Multimodality | Expensive evaluations |
|---|---|---|---|---|
| PSO | Yes | Velocity / position update with pbest, gbest | Medium | Poor fit |
| GA | Yes | Selection / crossover / mutation | High | Poor fit |
| CEM | Yes | Iterative distribution update | Medium | Poor fit |
| SA | No | Metropolis acceptance with temperature | High | Poor fit |
| Bayesian Opt | No | Gaussian process surrogate | Medium | Excellent |
PSO has very few hyperparameters and a simple implementation, making it practical for medium-dimensional continuous problems. When evaluations are extremely expensive (e.g. real-world experiments), Bayesian optimization that minimizes evaluation count is usually more appropriate.
Summary
- PSO is built from velocity/position updates (Eqs. 1, 2) and pbest/gbest updates (Eqs. 3, 4)
- Inertia weight \(w\) , cognitive coefficient \(c_1\) , and social coefficient \(c_2\) weight diversity, self-experience, and swarm-experience respectively
- Linearly decreasing inertia weight (Eq. 5) and velocity clipping are practically essential for multimodal problems like Rastrigin
- Convergence is fast on Sphere; on Rastrigin the configuration largely determines the outcome
Reading this together with the rest of the optimization series clarifies which method fits which problem class.
Related Articles
- Genetic Algorithms: Fundamentals and Python Implementation - Evolutionary metaheuristic. PSO moves continuously via velocity, while GA evolves discretely via crossover and mutation.
- Simulated Annealing: Theory and Python Implementation - Comparing PSO with the single-solution SA highlights the characteristics of population-based methods.
- Cross-Entropy Method: A Practical Monte Carlo Optimization Technique - CEM iteratively updates a probability distribution; the pull of all particles toward gbest in PSO has a similar flavor.
- Bayesian Optimization: Fundamentals and Python Implementation - Designed for expensive-evaluation settings. The problem regimes targeted by PSO and Bayesian optimization differ substantially.
References
- Kennedy, J., & Eberhart, R. (1995). “Particle Swarm Optimization”. Proceedings of IEEE International Conference on Neural Networks, 4, 1942-1948.
- Shi, Y., & Eberhart, R. (1998). “A modified particle swarm optimizer”. Proceedings of IEEE International Conference on Evolutionary Computation, 69-73.
- Clerc, M., & Kennedy, J. (2002). “The particle swarm – explosion, stability, and convergence in a multidimensional complex space”. IEEE Transactions on Evolutionary Computation, 6(1), 58-73.