Reinforcement Learning for ROBOTIS OP3 Walking: ROS Package Implementation

Introduction

This article explains the code of a ROS (Robot Operating System) package that enables the ROBOTIS OP3 humanoid robot to acquire walking locomotion using reinforcement learning within the Gazebo simulation environment.

Related repository: op3_walk

Result Video

The learned walking motion of OP3 can be seen in the following video:

op3_controller_demo

Method Description

This project utilizes Deep Q-Network (DQN). DQN combines Q-learning with deep learning, approximating the action-value function with a neural network.

The action-value function \(Q(s_t, a_t)\) is defined as a three-layer neural network and is updated based on the following Q-learning update rule:

\( Q(s*t, a_t) \leftarrow Q(s_t, a_t) + \eta (R*{t+1} + \gamma \max*{a'} Q(s*{t+1}, a') - Q(s_t, a_t)) \)

Here, \(\eta\) is the learning rate, \(R_{t+1}\) is the immediate reward, and \(\gamma\) is the discount factor.

The neural network is updated using backpropagation with the following loss function \(L\):

\( L = \mathbb{E}[(R_{t+1} + \gamma \max_{a'} Q(s_{t+1}, a') - Q(s_t, a_t))^2] \)

Program Structure

This ROS package primarily consists of the following Python scripts:

1. `function.py` and `motion.py`

function.py: Contains the basic definitions for the reinforcement learning agent.
- Agent class: Encapsulates a Brain class that defines the neural network.
- ReplayMemory class: Stores experiences (actions and states) collected by the agent from the environment. The Brain samples from this memory for loss calculation and neural network updates.
- Actions are discretized, and selection is made based on the epsilon-greedy method.
motion.py: Defines the specific discrete actions of the robot (e.g., target angles for each joint).

These scripts are based on the code from the following book:

Deep-Reinforcement-Learning-Book

2. `learning.py`

learning.py: Inherits from the Agent class defined in function.py and operates as a ROS node.
- Subscribes to robot states as ROS topics from controller.py.
- Calculates actions based on the subscribed states and publishes these actions as ROS topics.
- Uses PyTorch for neural network definition, requiring Python 3 for execution.

3. `controller.py`

controller.py: Subscribes to actions published by learning.py as ROS topics and controls the ROBOTIS OP3 in the Gazebo simulation.
- Publishes the robot’s current state (joint angles, center of mass position, etc.) as ROS topics.
- Due to dependencies of the OP3 ROS package, this script needs to be run in Python 2.

Learning Curve

Graph showing the change in walking distance as learning progresses. It can be observed that the walking distance increases with generations, indicating that the agent is learning efficient locomotion.

Walking Distance

Introduction

Result Video

Method Description

Program Structure

1. function.py and motion.py

2. learning.py

3. controller.py

Learning Curve

関連記事

Key Algorithms in Deep Reinforcement Learning

Advantage Actor-Critic (A2C)

Policy Gradient Methods

1. `function.py` and `motion.py`

2. `learning.py`

3. `controller.py`