Introduction
This article explains how to use a Genetic Algorithm (GA) to train the weights and biases of a neural network (NN) that approximates a specific function. GA is an optimization method inspired by biological evolution, and it can be effective for complex problems where gradient-based methods are difficult to apply.
The target function to learn is: $$ f(x,y) = \frac{\sin(x^2) / \cos(y) + x^2 - 5y + 30}{80} $$
Genetic Algorithm (GA)
GA is a search algorithm that mimics the mechanisms of biological evolution, particularly the principle of “survival of the fittest.” It represents candidate solutions as a population of “individuals (genes)” and evolves them toward better solutions through repeated genetic operations.
Basic GA Algorithm
- Initialize Population: Randomly generate a population of individuals (in this case, NN weights and biases).
- Evaluate Fitness: Calculate the “fitness” of each individual, measuring how well it solves the problem. Here, lower error between NN output and training data means higher fitness.
- Selection (Reproduction): Select individuals so that those with higher fitness have more opportunities to pass their genes to the next generation.
- Crossover: Create new individuals (offspring) by exchanging parts of the genes between selected pairs. This combines promising elements from different solutions.
- Mutation: With a certain probability, randomly alter parts of an individual’s genes. This promotes escape from local optima and maintains diversity.
- Generational Replacement: Replace the current population with the newly generated individuals.
- Termination Check: Stop if the maximum number of generations is reached or a satisfactory solution is found. Otherwise, return to step 2.
Properties of GA
- Advantage: Since gradient information is not required, GA can be applied to a wide range of problems regardless of differentiability or continuity. It has global search capability and is less likely to get trapped in local optima.
- Challenge: The best individual’s information can be lost through genetic operations (especially crossover). Additionally, many parameters (population size, crossover rate, mutation rate, etc.) need tuning, and convergence is not guaranteed.
Python Implementation
A set of NN weights and biases is treated as one “gene,” and GA is used to optimize it.
Key Parameters
import numpy as np
import math
import random
import matplotlib.pyplot as plt
# Parameter settings
GENERATIONS = 100 # Number of generations
POPULATION_SIZE = 1000 # Population size (number of NNs)
NUM_TEACHER_DATA = 1000 # Number of training data points
# NN structure
NUM_INPUT = 2
NUM_HIDDEN = 2
NUM_OUTPUT = 1
# GA parameters
CROSSOVER_RATE = 0.8 # Crossover rate
MUTATION_RATE = 0.05 # Mutation rate
# Target function
def target_function(x, y):
# Add small value to avoid divergence when cos(y) is near 0
cos_y = math.cos(y)
if abs(cos_y) < 1e-6:
cos_y = 1e-6
return (math.sin(x*x) / cos_y + x*x - 5*y + 30) / 80
# Activation function
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
Neural Network Class
Defines the NN corresponding to each individual.
class NeuralNetwork:
def __init__(self):
# Randomly initialize weights and biases
self.w_ih = np.random.uniform(-1, 1, (NUM_INPUT, NUM_HIDDEN))
self.b_h = np.random.uniform(-1, 1, NUM_HIDDEN)
self.w_ho = np.random.uniform(-1, 1, (NUM_HIDDEN, NUM_OUTPUT))
self.b_o = np.random.uniform(-1, 1, NUM_OUTPUT)
self.fitness = 0.0 # Fitness
def predict(self, x):
# Forward propagation
hidden_layer_input = np.dot(x, self.w_ih) + self.b_h
hidden_layer_output = sigmoid(hidden_layer_input)
output_layer_input = np.dot(hidden_layer_output, self.w_ho) + self.b_o
# Output layer uses identity activation
return output_layer_input[0]
def calculate_fitness(self, teacher_inputs, teacher_outputs):
# Calculate mean squared error over all training data
error = 0.0
for i in range(len(teacher_inputs)):
prediction = self.predict(teacher_inputs[i])
error += (prediction - teacher_outputs[i]) ** 2
mean_squared_error = error / len(teacher_inputs)
# Define fitness so lower error = higher fitness
self.fitness = 1.0 / (mean_squared_error + 1e-9) # Avoid division by zero
GA Class
Implements GA operations (selection, crossover, mutation).
class GeneticAlgorithm:
def __init__(self):
self.population = [NeuralNetwork() for _ in range(POPULATION_SIZE)]
def run_generation(self, teacher_inputs, teacher_outputs):
# 1. Calculate fitness for all individuals
for individual in self.population:
individual.calculate_fitness(teacher_inputs, teacher_outputs)
# 2. Generate new generation
new_population = []
# Elitism: Keep the best individual unchanged
elite = max(self.population, key=lambda ind: ind.fitness)
new_population.append(elite)
while len(new_population) < POPULATION_SIZE:
# 3. Selection (roulette wheel selection)
parent1 = self._roulette_selection()
parent2 = self._roulette_selection()
# 4. Crossover
child1, child2 = self._crossover(parent1, parent2)
# 5. Mutation
self._mutate(child1)
self._mutate(child2)
new_population.extend([child1, child2])
self.population = new_population[:POPULATION_SIZE]
def _roulette_selection(self):
total_fitness = sum(ind.fitness for ind in self.population)
pick = random.uniform(0, total_fitness)
current = 0
for individual in self.population:
current += individual.fitness
if current > pick:
return individual
return self.population[-1]
def _crossover(self, parent1, parent2):
child1 = NeuralNetwork()
child2 = NeuralNetwork()
if random.random() < CROSSOVER_RATE:
# Randomly swap parameter sets (uniform crossover, simplified)
child1.w_ih, child2.w_ih = (parent1.w_ih, parent2.w_ih) if random.random() < 0.5 else (parent2.w_ih, parent1.w_ih)
child1.b_h, child2.b_h = (parent1.b_h, parent2.b_h) if random.random() < 0.5 else (parent2.b_h, parent1.b_h)
child1.w_ho, child2.w_ho = (parent1.w_ho, parent2.w_ho) if random.random() < 0.5 else (parent2.w_ho, parent1.w_ho)
child1.b_o, child2.b_o = (parent1.b_o, parent2.b_o) if random.random() < 0.5 else (parent2.b_o, parent1.b_o)
else:
child1, child2 = parent1, parent2 # No crossover
return child1, child2
def _mutate(self, individual):
# Replace each weight/bias with probability MUTATION_RATE
for w in [individual.w_ih, individual.b_h, individual.w_ho, individual.b_o]:
if random.random() < MUTATION_RATE:
w += np.random.uniform(-0.1, 0.1, w.shape)
Main Function
def main():
# Generate training data
teacher_inputs = np.random.uniform(-5, 5, (NUM_TEACHER_DATA, NUM_INPUT))
teacher_outputs = np.array([target_function(x[0], x[1]) for x in teacher_inputs])
# Generate test data
test_inputs = np.random.uniform(-5, 5, (NUM_TEACHER_DATA, NUM_INPUT))
test_outputs = np.array([target_function(x[0], x[1]) for x in test_inputs])
ga = GeneticAlgorithm()
elite_errors = []
print("Training started...")
for gen in range(GENERATIONS):
ga.run_generation(teacher_inputs, teacher_outputs)
# Find the best individual (elite)
elite = max(ga.population, key=lambda ind: ind.fitness)
# Evaluate elite on test data
test_error = 0.0
for i in range(len(test_inputs)):
prediction = elite.predict(test_inputs[i])
test_error += (prediction - test_outputs[i]) ** 2
mean_squared_error = test_error / len(test_inputs)
elite_errors.append(mean_squared_error)
if (gen + 1) % 10 == 0:
print(f"Generation: {gen + 1}, Test Error (MSE): {mean_squared_error:.6f}")
# Plot results
plt.plot(elite_errors)
plt.title("Elite Individual's Error on Test Data")
plt.xlabel("Generation")
plt.ylabel("Mean Squared Error")
plt.grid(True)
plt.savefig("ga_nn_learning_curve.png")
plt.show()
if __name__ == '__main__':
main()
Experimental Results
The elite individual (highest fitness) from each generation was evaluated on test data, and the mean squared error was plotted. As generations progress, the error decreases, confirming that the NN is learning the function.
