Particle Swarm Optimization (PSO): Math and Python Implementation with Inertia Weight, Cognitive and Social Coefficients

An in-depth guide to Particle Swarm Optimization (PSO): velocity and position update equations, the role of the inertia weight w and coefficients c1, c2, and a Python implementation benchmarked on Sphere and Rastrigin functions with convergence visualization.

Introduction

Particle Swarm Optimization (PSO) is a metaheuristic optimization technique inspired by the collective behavior of bird flocks and fish schools. Proposed by Kennedy and Eberhart in 1995, it is a gradient-free black-box optimizer that has been widely used for continuous, multimodal problems.

The defining feature of PSO is that each particle maintains a position and a velocity, and is attracted both to its own best historical position (personal best, pbest) and to the best position discovered by the entire swarm (global best, gbest). The swarm collectively converges toward promising regions of the search space much faster than pure random search.

This article reviews the PSO update rules through the lens of the inertia weight \(w\) , cognitive coefficient \(c_1\) , and social coefficient \(c_2\) , and walks through a Python implementation benchmarked on the Sphere and Rastrigin functions with convergence plots.

Algorithm

Particle State

Let the swarm size be \(N\) and the search space dimension be \(D\) . Each particle \(i \in \{1, \dots, N\}\) keeps:

  • position \(\mathbf{x}_i^t \in \mathbb{R}^D\)
  • velocity \(\mathbf{v}_i^t \in \mathbb{R}^D\)
  • personal best \(\mathbf{p}_i^t\) (best position particle \(i\) has visited so far)

The swarm shares a single global best position \(\mathbf{g}^t = \arg\min_{i} f(\mathbf{p}_i^t)\) .

Velocity Update

The velocity of each particle is updated by:

$$ \mathbf{v}_i^{t+1} = w , \mathbf{v}_i^{t}

  • c_1 \mathbf{r}_1 \odot (\mathbf{p}_i^{t} - \mathbf{x}_i^{t})
  • c_2 \mathbf{r}_2 \odot (\mathbf{g}^{t} - \mathbf{x}_i^{t}) \tag{1} $$

Here \(\mathbf{r}_1, \mathbf{r}_2 \sim U[0,1]^D\) are independent uniform random vectors and \(\odot\) is the element-wise product. The three terms are:

  • Term 1 \(w \, \mathbf{v}_i^t\) : inertia term, controlling how much of the previous velocity is retained
  • Term 2 \(c_1 \mathbf{r}_1 \odot (\mathbf{p}_i^t - \mathbf{x}_i^t)\) : cognitive term, pulling toward the particle’s own best (pbest)
  • Term 3 \(c_2 \mathbf{r}_2 \odot (\mathbf{g}^t - \mathbf{x}_i^t)\) : social term, pulling toward the swarm’s best (gbest)

Position Update

Position is advanced linearly by the new velocity:

\[ \mathbf{x}_i^{t+1} = \mathbf{x}_i^{t} + \mathbf{v}_i^{t+1} \tag{2} \]

Updating pbest and gbest

After each iteration, replace the personal best whenever the current position improves the objective:

\[ \mathbf{p}_i^{t+1} = \begin{cases} \mathbf{x}_i^{t+1} & \text{if } f(\mathbf{x}_i^{t+1}) < f(\mathbf{p}_i^{t}) \\ \mathbf{p}_i^{t} & \text{otherwise} \end{cases} \tag{3} \]

The global best is then refreshed:

\[ \mathbf{g}^{t+1} = \arg\min_{i} f(\mathbf{p}_i^{t+1}) \tag{4} \]

Hyperparameters

Inertia Weight \(w\)

\(w\) governs the trade-off between exploration and exploitation.

  • Large \(w\) (e.g. \(w \approx 0.9\) ): particles strongly retain previous velocity, encouraging wide-area search
  • Small \(w\) (e.g. \(w \approx 0.4\) ): velocity decays quickly, promoting fine-grained local refinement

In practice, linearly decreasing inertia weight (LDIW) is the de facto standard, exploring broadly early on and converging late:

\[ w_t = w_{\max} - (w_{\max} - w_{\min}) \cdot \frac{t}{T_{\max}} \tag{5} \]

Typical values are \(w_{\max} = 0.9\) and \(w_{\min} = 0.4\) .

Cognitive Coefficient \(c_1\) and Social Coefficient \(c_2\)

\(c_1\) and \(c_2\) weight the pull toward the particle’s own experience and the swarm’s experience, respectively.

  • \(c_1 \gg c_2\) : particles favor their own history, leading to more independent search
  • \(c_1 \ll c_2\) : pull toward the global best dominates, often causing premature convergence

Clerc & Kennedy (2002) derived a constriction coefficient form yielding \(w = 0.7298\) and \(c_1 = c_2 = 2.05\) . A widely used default is \(w = 0.729, c_1 = c_2 = 1.4944\) .

Velocity Clipping

To prevent particles from diverging out of the search domain, each velocity component is typically clipped to \(|v_{i,d}| \leq v_{\max}\) , where \(v_{\max}\) is around 10–20% of the search range width.

Benchmark Functions

Sphere Function (Unimodal)

\[ f_{\text{sphere}}(\mathbf{x}) = \sum_{i=1}^{D} x_i^2 \tag{6} \]

The global minimum is \(f(\mathbf{0}) = 0\) . Being unimodal, PSO converges easily.

Rastrigin Function (Multimodal)

\[ f_{\text{rastrigin}}(\mathbf{x}) = 10 D + \sum_{i=1}^{D} \left[ x_i^2 - 10 \cos(2\pi x_i) \right] \tag{7} \]

The global minimum is again \(f(\mathbf{0}) = 0\) , but the landscape has many local minima. With the wrong inertia weight or coefficients, the swarm may stagnate at a local minimum.

Python Implementation

A minimal NumPy-only PSO with linearly decreasing inertia weight and velocity clipping:

import numpy as np
import matplotlib.pyplot as plt


def sphere(x):
    """Sphere function (unimodal)"""
    return np.sum(x**2, axis=-1)


def rastrigin(x):
    """Rastrigin function (multimodal)"""
    return 10 * x.shape[-1] + np.sum(x**2 - 10 * np.cos(2 * np.pi * x), axis=-1)


class PSO:
    def __init__(self, dim, n_particles=40, bounds=(-5.12, 5.12),
                 w_max=0.9, w_min=0.4, c1=1.4944, c2=1.4944,
                 v_max_ratio=0.2):
        self.dim = dim
        self.n_particles = n_particles
        self.bounds = bounds
        self.w_max = w_max
        self.w_min = w_min
        self.c1 = c1
        self.c2 = c2
        # v_max is a fraction of the search range width
        self.v_max = v_max_ratio * (bounds[1] - bounds[0])

    def initialize(self):
        low, high = self.bounds
        x = np.random.uniform(low, high, (self.n_particles, self.dim))
        v = np.random.uniform(-self.v_max, self.v_max,
                              (self.n_particles, self.dim))
        return x, v

    def run(self, objective, max_iter=200):
        x, v = self.initialize()
        f = objective(x)

        # Initialize pbest / gbest
        pbest = x.copy()
        pbest_f = f.copy()
        g_idx = int(np.argmin(pbest_f))
        gbest = pbest[g_idx].copy()
        gbest_f = float(pbest_f[g_idx])

        history = [gbest_f]

        for t in range(max_iter):
            # Linearly decreasing inertia weight
            w = self.w_max - (self.w_max - self.w_min) * (t / max_iter)

            # Velocity update (Eq. 1)
            r1 = np.random.random((self.n_particles, self.dim))
            r2 = np.random.random((self.n_particles, self.dim))
            v = (w * v
                 + self.c1 * r1 * (pbest - x)
                 + self.c2 * r2 * (gbest - x))
            v = np.clip(v, -self.v_max, self.v_max)

            # Position update (Eq. 2)
            x = x + v
            x = np.clip(x, *self.bounds)

            # Evaluate and update pbest / gbest
            f = objective(x)
            improved = f < pbest_f
            pbest[improved] = x[improved]
            pbest_f[improved] = f[improved]

            g_idx = int(np.argmin(pbest_f))
            if pbest_f[g_idx] < gbest_f:
                gbest = pbest[g_idx].copy()
                gbest_f = float(pbest_f[g_idx])

            history.append(gbest_f)

        return gbest, gbest_f, history


# --- Run ---
np.random.seed(42)
pso_sphere = PSO(dim=10)
_, best_sphere, hist_sphere = pso_sphere.run(sphere, max_iter=200)

np.random.seed(42)
pso_rastrigin = PSO(dim=10)
_, best_rastrigin, hist_rastrigin = pso_rastrigin.run(rastrigin, max_iter=200)

print(f"Sphere    best fitness: {best_sphere:.6e}")
print(f"Rastrigin best fitness: {best_rastrigin:.6e}")

# --- Convergence plot ---
plt.figure(figsize=(10, 5))
plt.plot(hist_sphere, label='Sphere (10D)')
plt.plot(hist_rastrigin, label='Rastrigin (10D)')
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('PSO Convergence on Sphere and Rastrigin')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

On Sphere, PSO drives the objective down to roughly \(10^{-10}\) within tens of iterations. On Rastrigin, multimodality often causes early stagnation, and the choice of inertia weight, swarm size, and coefficients has a large effect on the final solution.

Effect of Inertia Weight on Convergence

Comparing different fixed \(w\) values is straightforward:

import numpy as np
import matplotlib.pyplot as plt

results = {}
for w in [0.3, 0.5, 0.7, 0.9]:
    np.random.seed(0)
    pso = PSO(dim=10, w_max=w, w_min=w)  # fixed inertia weight
    _, _, history = pso.run(rastrigin, max_iter=200)
    results[w] = history

plt.figure(figsize=(10, 5))
for w, history in results.items():
    plt.plot(history, label=f'w = {w}')
plt.xlabel('Iteration')
plt.ylabel('Best Objective Value')
plt.title('Effect of Inertia Weight on Rastrigin (10D)')
plt.yscale('log')
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

Typically, \(w = 0.3\) produces weak global exploration and is prone to local minima, while \(w = 0.9\) converges slowly. A dynamic schedule like LDIW is the most robust default.

Comparison with Other Optimizers

MethodPopulation-basedMain mechanismMultimodalityExpensive evaluations
PSOYesVelocity / position update with pbest, gbestMediumPoor fit
GAYesSelection / crossover / mutationHighPoor fit
CEMYesIterative distribution updateMediumPoor fit
SANoMetropolis acceptance with temperatureHighPoor fit
Bayesian OptNoGaussian process surrogateMediumExcellent

PSO has very few hyperparameters and a simple implementation, making it practical for medium-dimensional continuous problems. When evaluations are extremely expensive (e.g. real-world experiments), Bayesian optimization that minimizes evaluation count is usually more appropriate.

Summary

  • PSO is built from velocity/position updates (Eqs. 1, 2) and pbest/gbest updates (Eqs. 3, 4)
  • Inertia weight \(w\) , cognitive coefficient \(c_1\) , and social coefficient \(c_2\) weight diversity, self-experience, and swarm-experience respectively
  • Linearly decreasing inertia weight (Eq. 5) and velocity clipping are practically essential for multimodal problems like Rastrigin
  • Convergence is fast on Sphere; on Rastrigin the configuration largely determines the outcome

Reading this together with the rest of the optimization series clarifies which method fits which problem class.

References

  • Kennedy, J., & Eberhart, R. (1995). “Particle Swarm Optimization”. Proceedings of IEEE International Conference on Neural Networks, 4, 1942-1948.
  • Shi, Y., & Eberhart, R. (1998). “A modified particle swarm optimizer”. Proceedings of IEEE International Conference on Evolutionary Computation, 69-73.
  • Clerc, M., & Kennedy, J. (2002). “The particle swarm – explosion, stability, and convergence in a multidimensional complex space”. IEEE Transactions on Evolutionary Computation, 6(1), 58-73.