Particle Swarm Optimization (PSO): Math and Python Implementation with Inertia Weight, Cognitive and Social Coefficients

Introduction

Particle Swarm Optimization (PSO) is a metaheuristic optimization technique inspired by the collective behavior of bird flocks and fish schools. Proposed by Kennedy and Eberhart in 1995, it is a gradient-free black-box optimizer that has been widely used for continuous, multimodal problems.

The defining feature of PSO is that each particle maintains a position and a velocity, and is attracted both to its own best historical position (personal best, pbest) and to the best position discovered by the entire swarm (global best, gbest). The swarm collectively converges toward promising regions of the search space much faster than pure random search.

This article reviews the PSO update rules through the lens of the inertia weight $w$ , cognitive coefficient $c_1$ , and social coefficient $c_2$ , and walks through a Python implementation benchmarked on the Sphere and Rastrigin functions with convergence plots.

Algorithm

Particle State

Let the swarm size be $N$ and the search space dimension be $D$ . Each particle $i \in \{1, \dots, N\}$ keeps:

position $\mathbf{x}_i^t \in \mathbb{R}^D$
velocity $\mathbf{v}_i^t \in \mathbb{R}^D$
personal best $\mathbf{p}_i^t$ (best position particle $i$ has visited so far)

The swarm shares a single global best position $\mathbf{g}^t = \arg\min_{i} f(\mathbf{p}_i^t)$ .

Velocity Update

The velocity of each particle is updated by:

$$ \mathbf{v}_i^{t+1} = w , \mathbf{v}_i^{t}

c_1 \mathbf{r}_1 \odot (\mathbf{p}_i^{t} - \mathbf{x}_i^{t})
c_2 \mathbf{r}_2 \odot (\mathbf{g}^{t} - \mathbf{x}_i^{t}) \tag{1} $$

Here $\mathbf{r}_1, \mathbf{r}_2 \sim U[0,1]^D$ are independent uniform random vectors and $\odot$ is the element-wise product. The three terms are:

Term 1 $w \, \mathbf{v}_i^t$ : inertia term, controlling how much of the previous velocity is retained
Term 2 $c_1 \mathbf{r}_1 \odot (\mathbf{p}_i^t - \mathbf{x}_i^t)$ : cognitive term, pulling toward the particle’s own best (pbest)
Term 3 $c_2 \mathbf{r}_2 \odot (\mathbf{g}^t - \mathbf{x}_i^t)$ : social term, pulling toward the swarm’s best (gbest)

Position Update

Position is advanced linearly by the new velocity:

\[ \mathbf{x}_i^{t+1} = \mathbf{x}_i^{t} + \mathbf{v}_i^{t+1} \tag{2} \]

Updating pbest and gbest

After each iteration, replace the personal best whenever the current position improves the objective:

\[ \mathbf{p}_i^{t+1} = \begin{cases} \mathbf{x}_i^{t+1} & \text{if } f(\mathbf{x}_i^{t+1}) < f(\mathbf{p}_i^{t}) \\ \mathbf{p}_i^{t} & \text{otherwise} \end{cases} \tag{3} \]

The global best is then refreshed:

\[ \mathbf{g}^{t+1} = \arg\min_{i} f(\mathbf{p}_i^{t+1}) \tag{4} \]

Hyperparameters

Inertia Weight $w$

$w$ governs the trade-off between exploration and exploitation.

Large $w$ (e.g. $w \approx 0.9$ ): particles strongly retain previous velocity, encouraging wide-area search
Small $w$ (e.g. $w \approx 0.4$ ): velocity decays quickly, promoting fine-grained local refinement

In practice, linearly decreasing inertia weight (LDIW) is the de facto standard, exploring broadly early on and converging late:

\[ w_t = w_{\max} - (w_{\max} - w_{\min}) \cdot \frac{t}{T_{\max}} \tag{5} \]

Typical values are $w_{\max} = 0.9$ and $w_{\min} = 0.4$ .

$c_1$ and $c_2$ weight the pull toward the particle’s own experience and the swarm’s experience, respectively.

$c_1 \gg c_2$ : particles favor their own history, leading to more independent search
$c_1 \ll c_2$ : pull toward the global best dominates, often causing premature convergence

Clerc & Kennedy (2002) derived a constriction coefficient form yielding $w = 0.7298$ and $c_1 = c_2 = 2.05$ . A widely used default is $w = 0.729, c_1 = c_2 = 1.4944$ .

Velocity Clipping

To prevent particles from diverging out of the search domain, each velocity component is typically clipped to $|v_{i,d}| \leq v_{\max}$ , where $v_{\max}$ is around 10–20% of the search range width.

Benchmark Functions

Sphere Function (Unimodal)

\[ f_{\text{sphere}}(\mathbf{x}) = \sum_{i=1}^{D} x_i^2 \tag{6} \]

The global minimum is $f(\mathbf{0}) = 0$ . Being unimodal, PSO converges easily.

Rastrigin Function (Multimodal)

\[ f_{\text{rastrigin}}(\mathbf{x}) = 10 D + \sum_{i=1}^{D} \left[ x_i^2 - 10 \cos(2\pi x_i) \right] \tag{7} \]

The global minimum is again $f(\mathbf{0}) = 0$ , but the landscape has many local minima. With the wrong inertia weight or coefficients, the swarm may stagnate at a local minimum.

Python Implementation

A minimal NumPy-only PSO with linearly decreasing inertia weight and velocity clipping:

import numpy as np
import matplotlib.pyplot as plt


def sphere(x):
    """Sphere function (unimodal)"""
    return np.sum(x**2, axis=-1)


def rastrigin(x):
    """Rastrigin function (multimodal)"""
    return 10 * x.shape[-1] + np.sum(x**2 - 10 * np.cos(2 * np.pi * x), axis=-1)


class PSO:
    def __init__(self, dim, n_particles=40, bounds=(-5.12, 5.12),
                 w_max=0.9, w_min=0.4, c1=1.4944, c2=1.4944,
                 v_max_ratio=0.2):
        self.dim = dim
        self.n_particles = n_particles
        self.bounds = bounds
        self.w_max = w_max
        self.w_min = w_min
        self.c1 = c1
        self.c2 = c2
        # v_max is a fraction of the search range width
        self.v_max = v_max_ratio * (bounds[1] - bounds[0])

    def initialize(self):
        low, high = self.bounds
        x = np.random.uniform(low, high, (self.n_particles, self.dim))
        v = np.random.uniform(-self.v_max, self.v_max,
                              (self.n_particles, self.dim))
        return x, v

    def run(self, objective, max_iter=200):
        x, v = self.initialize()
        f = objective(x)

        # Initialize pbest / gbest
        pbest = x.copy()
        pbest_f = f.copy()
        g_idx = int(np.argmin(pbest_f))
        gbest = pbest[g_idx].copy()
        gbest_f = float(pbest_f[g_idx])

        history = [gbest_f]

        for t in range(max_iter):
            # Linearly decreasing inertia weight
            w = self.w_max - (self.w_max - self.w_min) * (t / max_iter)

            # Velocity update (Eq. 1)
            r1 = np.random.random((self.n_particles, self.dim))
            r2 = np.random.random((self.n_particles, self.dim))
            v = (w * v
                 + self.c1 * r1 * (pbest - x)
                 + self.c2 * r2 * (gbest - x))
            v = np.clip(v, -self.v_max, self.v_max)

            # Position update (Eq. 2)
            x = x + v
            x = np.clip(x, *self.bounds)

            # Evaluate and update pbest / gbest
            f = objective(x)
            improved = f < pbest_f
            pbest[improved] = x[improved]
            pbest_f[improved] = f[improved]

            g_idx = int(np.argmin(pbest_f))
            if pbest_f[g_idx] < gbest_f:
                gbest = pbest[g_idx].copy()
                gbest_f = float(pbest_f[g_idx])

            history.append(gbest_f)

        return gbest, gbest_f, history


# --- Run ---
np.random.seed(42)
pso_sphere = PSO(dim=10)
_, best_sphere, hist_sphere = pso_sphere.run(sphere, max_iter=200)

np.random.seed(42)
pso_rastrigin = PSO(dim=10)
_, best_rastrigin, hist_rastrigin = pso_rastrigin.run(rastrigin, max_iter=200)

print(f"Sphere    best fitness: {best_sphere:.6e}")
print(f"Rastrigin best fitness: {best_rastrigin:.6e}")

# --- Convergence plot ---
plt.figure(figsize=(10, 5))
plt.plot(hist_sphere, label='Sphere (10D)')
plt.plot(hist_rastrigin, label='Rastrigin (10D)')
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('PSO Convergence on Sphere and Rastrigin')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

On Sphere, PSO drives the objective down to roughly $10^{-10}$ within tens of iterations. On Rastrigin, multimodality often causes early stagnation, and the choice of inertia weight, swarm size, and coefficients has a large effect on the final solution.

Effect of Inertia Weight on Convergence

Comparing different fixed $w$ values is straightforward:

import numpy as np
import matplotlib.pyplot as plt

results = {}
for w in [0.3, 0.5, 0.7, 0.9]:
    np.random.seed(0)
    pso = PSO(dim=10, w_max=w, w_min=w)  # fixed inertia weight
    _, _, history = pso.run(rastrigin, max_iter=200)
    results[w] = history

plt.figure(figsize=(10, 5))
for w, history in results.items():
    plt.plot(history, label=f'w = {w}')
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('Effect of Inertia Weight on Rastrigin (10D)')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

Typically, $w = 0.3$ produces weak global exploration and is prone to local minima, while $w = 0.9$ converges slowly. A dynamic schedule like LDIW is the most robust default.

Comparison with Other Optimizers

Method	Population-based	Main mechanism	Multimodality	Expensive evaluations
PSO	Yes	Velocity / position update with pbest, gbest	Medium	Poor fit
GA	Yes	Selection / crossover / mutation	High	Poor fit
CEM	Yes	Iterative distribution update	Medium	Poor fit
SA	No	Metropolis acceptance with temperature	High	Poor fit
Bayesian Opt	No	Gaussian process surrogate	Medium	Excellent

PSO has very few hyperparameters and a simple implementation, making it practical for medium-dimensional continuous problems. When evaluations are extremely expensive (e.g. real-world experiments), Bayesian optimization that minimizes evaluation count is usually more appropriate.

Summary

PSO is built from velocity/position updates (Eqs. 1, 2) and pbest/gbest updates (Eqs. 3, 4)
Inertia weight $w$ , cognitive coefficient $c_1$ , and social coefficient $c_2$ weight diversity, self-experience, and swarm-experience respectively
Linearly decreasing inertia weight (Eq. 5) and velocity clipping are practically essential for multimodal problems like Rastrigin
Convergence is fast on Sphere; on Rastrigin the configuration largely determines the outcome

Reading this together with the rest of the optimization series clarifies which method fits which problem class.

Genetic Algorithms: Fundamentals and Python Implementation - Evolutionary metaheuristic. PSO moves continuously via velocity, while GA evolves discretely via crossover and mutation.
Simulated Annealing: Theory and Python Implementation - Comparing PSO with the single-solution SA highlights the characteristics of population-based methods.
Cross-Entropy Method: A Practical Monte Carlo Optimization Technique - CEM iteratively updates a probability distribution; the pull of all particles toward gbest in PSO has a similar flavor.
Bayesian Optimization: Fundamentals and Python Implementation - Designed for expensive-evaluation settings. The problem regimes targeted by PSO and Bayesian optimization differ substantially.

References

Kennedy, J., & Eberhart, R. (1995). “Particle Swarm Optimization”. Proceedings of IEEE International Conference on Neural Networks, 4, 1942-1948.
Shi, Y., & Eberhart, R. (1998). “A modified particle swarm optimizer”. Proceedings of IEEE International Conference on Evolutionary Computation, 69-73.
Clerc, M., & Kennedy, J. (2002). “The particle swarm – explosion, stability, and convergence in a multidimensional complex space”. IEEE Transactions on Evolutionary Computation, 6(1), 58-73.

Introduction

Algorithm

Particle State

Velocity Update

Position Update

Updating pbest and gbest

Hyperparameters

Inertia Weight \(w\)

Cognitive Coefficient \(c_1\) and Social Coefficient \(c_2\)

Velocity Clipping

Benchmark Functions

Sphere Function (Unimodal)

Rastrigin Function (Multimodal)

Python Implementation

Effect of Inertia Weight on Convergence

Comparison with Other Optimizers

Summary

Related Articles

References

この記事に関連する無料ツール

関連記事

From SGD to Adam: Evolution of Gradient-Based Optimization

Simulated Annealing: Theory and Python Implementation

Genetic Algorithms: Fundamentals and Python Implementation