Time Series Forecasting Fundamentals: ARIMA Theory and Python Implementation

From AR, MA, ARIMA model definitions to parameter selection via ACF/PACF, statsmodels implementation, and SARIMA for seasonal data.

Introduction

Time series forecasting is essential across domains including demand planning, financial analysis, and weather prediction. ARIMA (AutoRegressive Integrated Moving Average) is a classical yet powerful time series forecasting method.

This article explains the mathematical foundations of ARIMA’s components (AR, MA, differencing) and demonstrates Python implementation with statsmodels.

AR (AutoRegressive) Model

The AR(\(p\)) model expresses the current value using the past \(p\) values:

\[X_t = c + \sum_{i=1}^{p} \phi_i X_{t-i} + \varepsilon_t \tag{1}\]

where \(\phi_i\) are autoregressive coefficients and \(\varepsilon_t \sim \mathcal{N}(0, \sigma^2)\) is white noise.

Stationarity condition: All roots of \(1 - \phi_1 z - \cdots - \phi_p z^p = 0\) must lie outside the unit circle.

MA (Moving Average) Model

The MA(\(q\)) model is a linear combination of past \(q\) noise terms:

\[X_t = \mu + \varepsilon_t + \sum_{j=1}^{q} \theta_j \varepsilon_{t-j} \tag{2}\]

MA models are always stationary.

ARMA Model

Combining AR and MA gives ARMA(\(p, q\)):

\[X_t = c + \sum_{i=1}^{p} \phi_i X_{t-i} + \varepsilon_t + \sum_{j=1}^{q} \theta_j \varepsilon_{t-j} \tag{3}\]

ARIMA Model

For non-stationary series, apply \(d\) differences to achieve stationarity, then fit ARMA. This is ARIMA(\(p, d, q\)):

\[\Delta^d X_t = c + \sum_{i=1}^{p} \phi_i \Delta^d X_{t-i} + \varepsilon_t + \sum_{j=1}^{q} \theta_j \varepsilon_{t-j} \tag{4}\]

where \(\Delta X_t = X_t - X_{t-1}\) is the first difference.

  • \(p\): autoregressive order
  • \(d\): differencing order (typically 0, 1, or 2)
  • \(q\): moving average order

Parameter Selection

ACF and PACF

PatternACFPACFModel
AR(\(p\))Gradual decaySharp cutoff at lag \(p\)Use PACF for p
MA(\(q\))Sharp cutoff at lag \(q\)Gradual decayUse ACF for q
ARMA(\(p,q\))Gradual decayGradual decayUse AIC/BIC

Information Criteria

Use AIC or BIC to select the optimal \((p, d, q)\) combination:

\[\text{AIC} = -2\ln(L) + 2k \tag{5}\]

where \(L\) is maximum likelihood and \(k\) is the number of parameters.

Python Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# --- Generate data (AR(2) process + trend) ---
np.random.seed(42)
n = 300
noise = np.random.normal(0, 1, n)
data = np.zeros(n)
for t in range(2, n):
    data[t] = 0.5 * data[t-1] - 0.3 * data[t-2] + noise[t]
data += np.linspace(0, 10, n)

ts = pd.Series(data, index=pd.date_range('2024-01-01', periods=n, freq='D'))

# --- Stationarity test (ADF test) ---
result = adfuller(ts)
print(f"ADF statistic: {result[0]:.4f}, p-value: {result[1]:.4f}")

ts_diff = ts.diff().dropna()
result_diff = adfuller(ts_diff)
print(f"After 1st diff - ADF: {result_diff[0]:.4f}, p-value: {result_diff[1]:.4f}")

# --- ACF/PACF plots ---
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(ts_diff, ax=axes[0], lags=20, title='ACF (1st Difference)')
plot_pacf(ts_diff, ax=axes[1], lags=20, title='PACF (1st Difference)')
plt.tight_layout()
plt.show()

# --- ARIMA fitting ---
train = ts[:250]
test = ts[250:]

model = ARIMA(train, order=(2, 1, 0))
fitted = model.fit()
print(fitted.summary())

# --- Forecast ---
forecast = fitted.forecast(steps=len(test))

plt.figure(figsize=(12, 5))
plt.plot(train.index, train, label='Train')
plt.plot(test.index, test, label='Test')
plt.plot(test.index, forecast, 'r--', label='Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('ARIMA(2,1,0) Forecast')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

SARIMA (Seasonal ARIMA)

For seasonal data, use SARIMA(\(p, d, q\))(\(P, D, Q\))\(_s\):

\[\Phi_P(B^s) \phi_p(B) \Delta^d \Delta_s^D X_t = \Theta_Q(B^s) \theta_q(B) \varepsilon_t \tag{6}\]

where \(s\) is the seasonal period (12 for monthly data) and \(B\) is the backshift operator.

from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(train, order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
fitted = model.fit(disp=False)

Automated Model Selection

Use pmdarima’s auto_arima for automatic parameter search:

import pmdarima as pm

auto_model = pm.auto_arima(train, seasonal=False,
                           stepwise=True, suppress_warnings=True)
print(auto_model.summary())

References

  • Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control (5th ed.). Wiley.
  • Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.).
  • statsmodels ARIMA documentation