AES and ChaCha20 in Python: Block Cipher Rounds, GF(2^8), AES-GCM vs ChaCha20-Poly1305, and Nonce Pitfalls

AES GCM Python example + ChaCha20-Poly1305 vs AES-GCM comparison with the cryptography library (AESGCM / ChaCha20Poly1305 / Cipher APIs). Covers AES round structure (SubBytes, ShiftRows, MixColumns, AddRoundKey), GF(2^8) math, ECB/CBC/CTR/GCM modes, nonce reuse attacks, ARX design, and a from-scratch AES-128 verified against FIPS-197 test vectors.

Why Learn Symmetric-Key Cryptography?

Symmetric-key cryptography uses the same key for encryption and decryption. Public-key systems like RSA and elliptic curve cryptography (ECC) solve the problem of how to share a key; symmetric ciphers solve the problem of encrypting bulk data fast once a key is shared. With hardware support (AES-NI), AES runs hundreds to thousands of times faster than public-key operations — virtually all the data protected by TLS, disk encryption, and VPNs is protected by symmetric ciphers.

Two algorithms dominate today. AES (Advanced Encryption Standard), standardized by NIST in 2001, encrypts at under one cycle per byte on CPUs with AES-NI instructions. ChaCha20, designed by Daniel J. Bernstein, is a stream cipher that runs fast and in constant time even without hardware support, which is why TLS 1.3 ships exactly three cipher suites: AES-256-GCM, AES-128-GCM, and ChaCha20-Poly1305.

Choosing a good algorithm is not enough, though. The three recurring production failures are picking the wrong mode (ECB/CBC/CTR/GCM), reusing an IV/nonce, and encrypting without authentication. This article works through the AES round structure and its GF(2^8) mathematics, block cipher modes, and the ChaCha20-Poly1305 comparison — then implements (1) production AEAD code with the cryptography library and (2) an educational from-scratch AES-128 in pure Python. For a bird’s-eye view of the whole field, see the Cryptography Roadmap.

Symmetric vs Public-Key: Hybrid Encryption

The big picture first: real systems are hybrid.

AspectSymmetric (AES / ChaCha20)Public key (RSA / ECC)
KeysOne shared secret keyPublic / private key pair
SpeedVery fast (GB/s scale)Slow (100–1000x slower)
Key size128–256 bitsRSA 2048–4096 bits / ECC 256–512 bits
WeaknessKey distributionToo slow for bulk data
Main roleEncrypting the payloadKey exchange, signatures, authentication

The Achilles’ heel of symmetric crypto is key distribution: to send data securely you must first share a key, but you have no secure channel to share it over. Diffie-Hellman key exchange and key encapsulation with RSA / ElGamal break that loop.

A TLS 1.3 connection works in three steps:

  1. Key exchange: ECDHE (X25519 / P-256) establishes an ephemeral shared secret — ephemeral for forward secrecy.
  2. Key derivation: the shared secret goes through HKDF to derive symmetric session keys.
  3. Bulk encryption: application data is encrypted with AES-GCM or ChaCha20-Poly1305 under the session keys.

Public-key crypto handles only the first few hundred bytes; symmetric crypto protects everything else. This article is about step 3.

The AES Round Structure

AES is a block cipher operating on 128-bit (16-byte) blocks with a Substitution-Permutation Network (SPN) structure. The 16-byte internal state is viewed as a 4×4 byte matrix (column-major), transformed by four operations repeated over multiple rounds. The round count depends on the key size:

Key sizeRounds \(N_r\)Round keys
128 bit1011
192 bit1213
256 bit1415

Each round composes four transformations (the final round skips MixColumns).

SubBytes — the nonlinear substitution

Every byte is substituted through the S-box, defined as the composition of the multiplicative inverse in GF(2^8) and an affine transformation over GF(2):

\[ S(x) = A \cdot x^{-1} \oplus c \quad (c = \mathtt{0x63}) \tag{1} \]

with the convention \(0^{-1} = 0\) . The inverse map is the only nonlinear element in AES — resistance against differential and linear cryptanalysis comes from here. The affine layer masks the algebraic simplicity of the inverse (interpolation attacks).

ShiftRows — row rotation

Row \(r\) of the state is rotated left by \(r\) bytes (\(r = 0, 1, 2, 3\) ), spreading the four bytes of each column across four different columns.

MixColumns — column-wise linear map

Each column, viewed as a vector over GF(2^8), is multiplied by a fixed matrix:

\[ \begin{pmatrix} b_0 \\ b_1 \\ b_2 \\ b_3 \end{pmatrix} = \begin{pmatrix} 02 & 03 & 01 & 01 \\ 01 & 02 & 03 & 01 \\ 01 & 01 & 02 & 03 \\ 03 & 01 & 01 & 02 \end{pmatrix} \begin{pmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \end{pmatrix} \tag{2} \]

This is an MDS (Maximum Distance Separable) matrix, guaranteeing maximal diffusion: changing one input byte affects all four output bytes. Two rounds of ShiftRows (across rows) plus MixColumns (within columns) achieve full diffusion — one flipped byte influences all 16 bytes of the state.

AddRoundKey — key mixing

A byte-wise XOR of the state with the round key. It is the only key-dependent step, and being XOR, it is its own inverse.

Key schedule (key expansion)

The cipher key expands into \(N_r + 1\) round keys. For AES-128, the 16-byte key forms four words, and word \(w_i\) (\(i \ge 4\) ) is derived as

\[ w_i = w_{i-4} \oplus \begin{cases} \mathrm{SubWord}(\mathrm{RotWord}(w_{i-1})) \oplus \mathrm{Rcon}_{i/4} & (i \equiv 0 \bmod 4) \\ w_{i-1} & (\text{otherwise}) \end{cases} \tag{3} \]

where RotWord is a one-byte rotation, SubWord applies the S-box, and Rcon are the round constants \(x^{j-1} \in \mathrm{GF}(2^8)\) .

Arithmetic in GF(2^8)

The mathematical foundation of AES is the finite (Galois) field of order 256, \(\mathrm{GF}(2^8)\) . A byte \(b_7 b_6 \cdots b_0\) is identified with the polynomial

\[ b(x) = b_7 x^7 + b_6 x^6 + \cdots + b_1 x + b_0 \tag{4} \]

with coefficients in GF(2). Addition is plain XOR (coefficient-wise addition mod 2). Multiplication is polynomial multiplication reduced modulo the irreducible polynomial

\[ m(x) = x^8 + x^4 + x^3 + x + 1 \quad (= \mathtt{0x11B}) \tag{5} \]

Irreducibility of \(m(x)\) is what makes every nonzero element invertible — i.e., what makes this a field. In code, the primitive operation is xtime (multiply by \(x\) : shift left, and XOR with 0x1B on overflow); any product decomposes into shifts and XORs. The inverse can be computed with the extended Euclidean algorithm, or — since every nonzero element satisfies \(a^{255} = 1\) —

\[ a^{-1} = a^{254} \tag{6} \]

via binary exponentiation. Both the S-box (eq. 1) and the MixColumns matrix product (eq. 2) live entirely in this field, and the from-scratch implementation below builds the S-box from equations (1) and (6) rather than pasting the table, to make that structure tangible.

Block Cipher Modes: From the ECB Disaster to GCM

AES by itself is just a function mapping 16 bytes to 16 bytes. Real messages are longer, so we need a mode of operation that chains blocks together — and the choice of mode is what actually decides security in practice.

ECB — the mode you must never use

ECB (Electronic CodeBook) encrypts each block independently:

\[ C_i = E_K(P_i) \tag{7} \]

Identical plaintext blocks always produce identical ciphertext blocks, so data patterns leak straight through the ciphertext. The famous failure image is the ECB penguin: encrypt a bitmap of Tux with ECB and the silhouette remains visible. ECB is not recommended for anything.

CBC — chaining and padding

CBC (Cipher Block Chaining) XORs each plaintext block with the previous ciphertext block before encrypting:

\[ C_i = E_K(P_i \oplus C_{i-1}), \quad C_0 = IV \tag{8} \]

The first block uses an initialization vector (IV), which need not be secret but must be unpredictable (random). Because CBC only processes whole blocks, plaintexts must be padded to a multiple of 16 bytes — typically PKCS#7 (pad \(n\) missing bytes with the value \(n\) ). Exploiting behavioral differences in padding validation gives the padding oracle attack (Vaudenay, 2002), which haunted TLS for years and is the reason TLS 1.3 removed all CBC suites.

CTR — turning a block cipher into a stream cipher

CTR (Counter) mode encrypts a counter to produce a keystream, then XORs it with the plaintext:

\[ C_i = P_i \oplus E_K(\mathrm{nonce} \,\|\, i) \tag{9} \]

No padding, full parallelism, and decryption is the same operation — CTR is the basis of modern modes. But equation (9) makes the failure mode obvious: use the same nonce/counter pair twice and the keystreams coincide, leaking \(C_1 \oplus C_2 = P_1 \oplus P_2\) (the two-time pad problem).

GCM — authenticated encryption (AEAD)

Everything so far provides only confidentiality: tampering goes undetected, and CBC/CTR ciphertexts can be maliciously bit-flipped to alter the plaintext in targeted ways. The modern answer is AEAD (Authenticated Encryption with Associated Data).

GCM (Galois/Counter Mode) combines CTR encryption with an authentication tag computed by GHASH, a polynomial evaluation over \(\mathrm{GF}(2^{128})\) :

\[ T = \mathrm{GHASH}_H(A, C) \oplus E_K(\mathrm{nonce} \,\|\, 0) \tag{10} \]

where \(H = E_K(0^{128})\) is the hash subkey and \(A\) is associated data (AAD) — headers you want integrity-protected but not encrypted. On decryption the tag is recomputed; on mismatch, no plaintext is released at all.

The nonce is GCM’s single point of failure. The nonce is 96 bits, and using the same (key, nonce) pair twice (a) reuses the CTR keystream, leaking the XOR of the plaintexts, and (b) allows algebraic recovery of the GHASH subkey \(H\) , after which all future tags can be forged (Joux’s “forbidden attack”). Manage nonces as counters (message numbers), or if generating randomly, keep the number of encryptions per key below \(2^{32}\) . If misuse resistance is a requirement, consider AES-GCM-SIV.

ModeConfidentialityIntegrityPaddingParallelNotes
ECBNeededYesNever use
CBCNeededDecrypt onlyPadding-oracle breeding ground
CTRNoneYesInstant failure on nonce reuse
GCMNoneYesThe modern standard (AEAD)

ChaCha20-Poly1305: an ARX Stream Cipher

ChaCha20 is Bernstein’s refinement of Salsa20; paired with the Poly1305 MAC it forms ChaCha20-Poly1305, standardized in RFC 8439 and adopted by TLS 1.3, SSH, and WireGuard as the other de facto AEAD alongside AES-GCM.

The ARX design

ChaCha20’s state is a 4×4 matrix of sixteen 32-bit words (512 bits), initialized with constants ("expand 32-byte k"), a 256-bit key, a counter, and a 96-bit nonce. The only primitive operations are ARX: Addition, Rotation, XOR:

\[ a \mathrel{+}= b;\quad d \mathrel{\oplus}= a;\quad d \lll 16 \tag{11} \]

Four such steps on four words form a quarter round; quarter rounds are applied alternately down columns and diagonals for 20 rounds, and the initial state is added back to produce a 64-byte keystream block. There are no table lookups analogous to the AES S-box, so ChaCha20 is constant-time by construction against cache-timing attacks.

AES-GCM vs ChaCha20-Poly1305

AspectAES-256-GCMChaCha20-Poly1305
TypeBlock cipher + GCM modeStream cipher + Poly1305
PrimitivesS-box, GF(2^8) / GF(2^128)ARX (add, rotate, XOR)
Key / nonce256 bit / 96 bit256 bit / 96 bit
With AES-NIExtremely fast (>1 GB/s)Fast
Without AES-NISlow + cache-timing concerns (table lookups)Fast and constant-time
Best hardwareServers, desktopsMobile, embedded, IoT
StandardNIST SP 800-38DRFC 8439
Deployed inTLS, IPsec, disk encryptionTLS, SSH, WireGuard, Signal

The rule of thumb: AES-GCM when you have AES-NI, ChaCha20-Poly1305 when you don’t. Software AES leaks cache-access patterns that depend on secret data, while ChaCha20 needs only basic CPU instructions — this is why Google pushed ChaCha20 suites for Android. Security-wise, no practical attack is known against either as of 2026; they are equally mature choices.

Python Implementation

1. Production: AESGCM / ChaCha20Poly1305 from the cryptography library

Production code should use the high-level AEAD API in cryptography.hazmat.primitives.ciphers.aead. There is no mode to choose and no padding to manage — the API is hard to misuse.

import os
from cryptography.hazmat.primitives.ciphers.aead import AESGCM, ChaCha20Poly1305

# --- AES-256-GCM ---
key = AESGCM.generate_key(bit_length=256)  # os.urandom under the hood
aesgcm = AESGCM(key)

nonce = os.urandom(12)  # 96 bits; never reuse under the same key
aad = b"header: message-id=42"  # authenticated but not encrypted

ciphertext = aesgcm.encrypt(nonce, b"secret message", aad)
plaintext = aesgcm.decrypt(nonce, ciphertext, aad)  # raises InvalidTag on mismatch
assert plaintext == b"secret message"

# Ciphertext includes the 16-byte tag, so it is 16 bytes longer than the plaintext
print(f"Ciphertext length: {len(ciphertext)} bytes (14 plaintext + 16 tag)")

# --- ChaCha20-Poly1305: the API is exactly parallel ---
key2 = ChaCha20Poly1305.generate_key()  # always 256 bit
chacha = ChaCha20Poly1305(key2)
nonce2 = os.urandom(12)

ct2 = chacha.encrypt(nonce2, b"secret message", aad)
assert chacha.decrypt(nonce2, ct2, aad) == b"secret message"

# --- Tamper-detection demo ---
from cryptography.exceptions import InvalidTag

tampered = bytes([ciphertext[0] ^ 0x01]) + ciphertext[1:]
try:
    aesgcm.decrypt(nonce, tampered, aad)
except InvalidTag:
    print("Tampering detected: no plaintext is released")

Three production rules:

  1. Generate keys with AESGCM.generate_key() or os.urandom(32) — never the random module, which is a predictable PRNG.
  2. Generate a fresh 12-byte nonce per message (or manage a counter). Nonces may be stored and transmitted in the clear; the only sin is reuse under the same key.
  3. Avoid the low-level API (Cipher with modes.CBC etc.): assembling CBC/CTR by hand from cryptography.hazmat.primitives.ciphers means owning padding and tag management yourself. Prefer the AEAD classes whenever they suffice.

Nonce reuse is worth internalizing: since GCM and ChaCha20 are CTR-style, encrypting two plaintexts under the same (key, nonce) yields ct1 XOR ct2 == pt1 XOR pt2, exposing plaintext structure directly.

2. Educational: AES-128 round functions from scratch

For learning, we build the S-box from its definition (equations 1 and 6), not from the published table, and implement full AES-128 single-block encryption in pure standard-library Python, verified against the official FIPS-197 Appendix C.1 test vector. (Educational only — no side-channel protection whatsoever; never use in production.)

# Educational: AES-128 round functions from scratch (FIPS-197)

# --- Multiplication in GF(2^8): irreducible poly x^8 + x^4 + x^3 + x + 1 (0x11B) ---
def gf_mul(a, b):
    """GF(2^8) multiplication: repeated shift and XOR (carry-less)."""
    r = 0
    for _ in range(8):
        if b & 1:
            r ^= a          # add (XOR) when the coefficient is 1
        b >>= 1
        carry = a & 0x80
        a = (a << 1) & 0xFF
        if carry:
            a ^= 0x1B       # replace x^8 by x^4 + x^3 + x + 1
    return r


def gf_inv(a):
    """Inverse in GF(2^8): a^254 (eq. 6), via binary exponentiation."""
    if a == 0:
        return 0
    result, base, exp = 1, a, 254
    while exp:
        if exp & 1:
            result = gf_mul(result, base)
        base = gf_mul(base, base)
        exp >>= 1
    return result


# --- Building the S-box: inverse + affine transformation (eq. 1) ---
def make_sbox():
    sbox = []
    for x in range(256):
        b, c = gf_inv(x), 0x63
        y = 0
        for i in range(8):
            bit = ((b >> i) ^ (b >> ((i + 4) % 8)) ^ (b >> ((i + 5) % 8))
                   ^ (b >> ((i + 6) % 8)) ^ (b >> ((i + 7) % 8)) ^ (c >> i)) & 1
            y |= bit << i
        sbox.append(y)
    return sbox


SBOX = make_sbox()
assert SBOX[0x00] == 0x63 and SBOX[0x53] == 0xED  # known values from FIPS-197


# --- The four round functions (state = 16-byte list, column-major) ---
def sub_bytes(state):
    """SubBytes: substitute every byte via the S-box (nonlinear layer)."""
    return [SBOX[b] for b in state]


def shift_rows(state):
    """ShiftRows: rotate row r left by r bytes (inter-column diffusion)."""
    return [state[(i + 4 * (i % 4)) % 16] for i in range(16)]


def mix_columns(state):
    """MixColumns: multiply each column by the MDS matrix (eq. 2) over GF(2^8)."""
    out = []
    for c in range(4):
        col = state[4 * c:4 * c + 4]
        out += [
            gf_mul(col[0], 2) ^ gf_mul(col[1], 3) ^ col[2] ^ col[3],
            col[0] ^ gf_mul(col[1], 2) ^ gf_mul(col[2], 3) ^ col[3],
            col[0] ^ col[1] ^ gf_mul(col[2], 2) ^ gf_mul(col[3], 3),
            gf_mul(col[0], 3) ^ col[1] ^ col[2] ^ gf_mul(col[3], 2),
        ]
    return out


def add_round_key(state, rk):
    """AddRoundKey: XOR with the round key (key mixing)."""
    return [s ^ k for s, k in zip(state, rk)]


# --- Key schedule (eq. 3): AES-128 has 10 rounds, 11 round keys ---
RCON = [0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36]


def key_expansion(key):
    words = [list(key[4 * i:4 * i + 4]) for i in range(4)]
    for i in range(4, 44):
        temp = list(words[i - 1])
        if i % 4 == 0:
            temp = temp[1:] + temp[:1]              # RotWord
            temp = [SBOX[b] for b in temp]          # SubWord
            temp[0] ^= RCON[i // 4 - 1]             # Rcon
        words.append([t ^ w for t, w in zip(temp, words[i - 4])])
    return [sum(words[4 * r:4 * r + 4], []) for r in range(11)]


def aes128_encrypt_block(plaintext, key):
    """Encrypt one 16-byte block with AES-128."""
    round_keys = key_expansion(key)
    state = add_round_key(list(plaintext), round_keys[0])
    for r in range(1, 10):
        state = sub_bytes(state)
        state = shift_rows(state)
        state = mix_columns(state)
        state = add_round_key(state, round_keys[r])
    # Final round has no MixColumns
    state = sub_bytes(state)
    state = shift_rows(state)
    state = add_round_key(state, round_keys[10])
    return bytes(state)


# --- Verify against the FIPS-197 Appendix C.1 test vector ---
key = bytes(range(16))                     # 000102...0e0f
pt = bytes.fromhex("00112233445566778899aabbccddeeff")
ct = aes128_encrypt_block(pt, key)
print(f"Ciphertext: {ct.hex()}")
assert ct.hex() == "69c4e0d86a7b0430d8cdb78070b4c55a"  # FIPS-197 expected value
print("Matches the FIPS-197 test vector: implementation is correct")

Output:

Ciphertext: 69c4e0d86a7b0430d8cdb78070b4c55a
Matches the FIPS-197 test vector: implementation is correct

Three takeaways:

  1. The S-box falls out of the definitions: make_sbox() uses only the affine map (eq. 1) and the inverse (eq. 6), yet reproduces the FIPS-197 table exactly. The S-box is not a magic constant — it is the structure of GF(2^8).
  2. gf_inv is binary exponentiation: computing \(a^{254}\) uses the same binary-expansion trick as RSA’s modular exponentiation and ECC scalar multiplication — one algorithm threading through all of cryptography.
  3. Why this must never ship: the SBOX[b] lookups touch memory at secret-dependent addresses, making them a target for cache-timing attacks (Bernstein, 2005). AES-NI and bitsliced implementations exist precisely to avoid this.

Key Sizes and Security Levels

Brute-forcing a \(k\) -bit symmetric key takes \(2^k\) trials, and this \(k\) defines the security level. Public-key schemes face mathematical attacks (number field sieve, Pollard’s rho), so they need much longer keys for the same level (see also the key-size comparison in the ECC article).

Security levelSymmetricRSA / DHECCStatus
112 bit3DES (retired)2048 bit224 bitCurrent minimum
128 bitAES-1283072 bit256 bitThe modern standard
192 bitAES-1927680 bit384 bitHigh security
256 bitAES-25615360 bit512 bitLong-term / top secret

Quantum computing affects the two worlds asymmetrically. Shor’s algorithm breaks RSA / ECC in polynomial time, but the best quantum attack on symmetric ciphers is Grover’s algorithm, which gives only a square-root speedup — effectively halving the security level (AES-128 → 64-bit effective, AES-256 → 128-bit effective). This is why the advice is “choose AES-256 if you are planning for the quantum era”: symmetric ciphers survive the post-quantum (PQC) transition essentially unchanged.

AES-192 is rare in practice simply because most protocols (including TLS) only define 128 and 256-bit variants. The 2009 related-key attacks on AES-256 remain of theoretical interest only.

Summary

  • Symmetric ciphers are the payload half of hybrid encryption: DH / RSA / ECC establish the shared key, then AES / ChaCha20 encrypt the bulk data fast.
  • AES is an SPN repeating SubBytes / ShiftRows / MixColumns / AddRoundKey for 10–14 rounds, with every operation defined over GF(2^8) (irreducible polynomial \(x^8 + x^4 + x^3 + x + 1\) ). The S-box is derivable from the inverse \(a^{254}\) plus an affine map.
  • The mode decides real-world security: never use ECB, CBC invites padding oracles, and for CTR / GCM nonce reuse is instantly fatal. The modern standards are the AEADs AES-GCM and ChaCha20-Poly1305.
  • ChaCha20’s ARX design has no table lookups, making it faster than AES and constant-time on hardware without AES-NI. Rule of thumb: AES-GCM with AES-NI, ChaCha20-Poly1305 without.
  • In Python, use AESGCM / ChaCha20Poly1305 from cryptography.hazmat.primitives.ciphers.aead. Keys via generate_key(), a fresh 12-byte nonce per message, and skip the low-level Cipher API.
  • Security levels line up as AES-128 ≈ RSA-3072 ≈ ECC-256. Grover’s algorithm halves effective symmetric strength, so pick AES-256 for long-term secrets.

References

  • NIST (2001). “Advanced Encryption Standard (AES)”. FIPS PUB 197.
  • Daemen, J., & Rijmen, V. (2002). The Design of Rijndael: AES — The Advanced Encryption Standard. Springer.
  • Dworkin, M. (2007). “Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC”. NIST SP 800-38D.
  • Nir, Y., & Langley, A. (2018). “ChaCha20 and Poly1305 for IETF Protocols”. RFC 8439.
  • Bernstein, D. J. (2005). “Cache-timing attacks on AES”.
  • Joux, A. (2006). “Authentication Failures in NIST version of GCM”. Comments on NIST SP 800-38D draft.
  • Vaudenay, S. (2002). “Security Flaws Induced by CBC Padding — Applications to SSL, IPSEC, WTLS…”. EUROCRYPT 2002.