PROCESSING...

Rice Robotics R&D

2025 - 2026

Reinforcement-Learned gait optimization for a laser-pointer-controlled feline quadruped robot.

Role

Software Co-Lead

Timeline

August 2025 - May 2026

Status

Ongoing

First simulation run of the servobot quadruped gait policy. Unstable but promising first steps!

Project Description

I am currently serving as the Software Co-Lead for the Rice Robotics team's quadruped robot project. Our team's goal is the ongoing development of a cat-inspired robot capable of autonomously navigating to follow the guidance of a laser-pointer-inspired targeting controller. To do this, we are leveraging reinforcement learning (RL) techniques to optimize the robot's gait and locomotion strategies.

Background

This project started around two years ago as an initially mechanical design challenge to create a leg capable of powerful 'jumping' motions. Over time, the project and team have grown significantly, evolving into a full quadruped robot platform with advanced locomotion capabilities. The inspiration for the robot's design comes from feline biology, aiming to replicate the agility and fluid movement of cats.

As the Software Co-Lead since August 2025, my responsibilities include overseeing the development of the robot's control systems, implementing reinforcement learning algorithms for gait optimization, and integrating sensor feedback for real-time navigation. The project combines elements of mechanical engineering, computer science, and robotics to create a sophisticated autonomous system.

This project is cool for a number of reasons: designing an agility-focused quadruped like this is a unique challenge that reinforcement learning is uniquely suited to, and brings with it a high level of complexity in both the computational and the physical engineering aspects of the problem. Additionally, the team at Rice Robotics is made up of a talented, interdisciplinary, and incredibly passionate group of both undergrad and graduate students who are all dedicated to pushing this project as far as we can.

Research Objectives

The main objective of my work so far has been to learn and implement a functional system that can effectively train the quadruped robot in a simulated environment, transfer the learned behavior to the physcial system, and run in real-time on the robot hardware. This involves several key steps:

Goals: We are utilizing the cutting-edge RL training & physics simulation framework Genesis to create a high-fidelity simulation of our quadruped robot. The simulation must accurately model the robot's dynamics, sensor feedback, and interaction with the environment to ensure effective training. Our simulation must utilize techniques like command delay and domain randomization to ensure that the robot's learned behaviors transfer effectively to the real world. This includes modeling real-world noise, latency, and variability in the simulation.

We also want to develop a priveleged-unpriveleged teacher-student training setup to improve sim-to-real transfer. Our final design will not have access to the same level of sensor data that we can provide in sim, so we are working to leverage the enhanced data in sim to train a more powerful policy that can then be distilled down to run on the real robot with limited sensing.

Finally, we need to implement an efficient onboard inference system that can run this trained policy in real-time on the robot's hardware within a ROS2 node. This is something we've done only a little bit of so far, but will be a key part of testing how well our domain randomization and other sim2real techniques have worked.

Success Metrics: A successful implementation should be able to:

1) Train a robust locomotion policy in simulation that can navigate to target locations indicated by a laser pointer
2) Transfer the learned policy to the physical robot with minimal performance degradation
3) Run the policy in real-time on the robot hardware, enabling fall recovery and more efficient locomotion across a variety of surfaces.

Robot Platform & System Design

The quadruped is a 12-motor robot with 3 servo-driven degrees of freedom per leg (1 hip motor, 2 knee motors). The main challenge of this project has been the constraints imposed by our team's limited access to hardware: the only available perception system on our current design is a single IMU (Inertial Measurement Unit). This has made state estimation and locomotion particularly challenging... we have to rely on the servos to follow our commands accurately within some margin of error, and can only get information on the body's orientation (from the IMU's magnetometer) and acceleration(from the IMU's accelerometer).

This constraint has made it necessary to design a system that can essentially predict the things we cannot directly measure, such as foot contact and joint exact positions/velocities. Obviously this introduces a lot of uncertainty, but it has also made the project substantially more interesting from a computational perspective.

The robot's computational platform consists of a Raspberry Pi 5 running ROS2 Jazzy on Ubuntu. The Pi interfaces directly with the servos, and communicates with the IMU over I2C. The RL policy, bluetooth controller interface, and state estimation nodes for the joints all run in parallel as ROS2 nodes on the Pi. This approach allows for easy modular development and integration of more advanced perception systems in the future without needing a substantial redesign.

While the current prototype we are testing on (named "Servobot") is extremely rudimentary, our mechanical and electrical teams are hard at work desinging the full robot platform that will eventually run all of this same software. The final design will have a body shape inspired by feline anatomy, with a lightweight frame and a complex linkage system for the legs. Servobot is primarily a testbed for our software development, allowing us to iterate quickly on the control systems and RL training while the full robot design is being finalized.

Software Architecture

Our training environment is built using the Genesis framework, which provides a powerful and fast physics simulation and RL training pipeline for robotics applications. We have built a custom robot model in Genesis that matches Servobot's (and later, Catbot's) dimensions and dynamics as closely as possible.

Control System

Our RL system outputs desired joint positions for each of the 12 servos at a fixed frequency. These commands are sent directly to the servos via the Raspberry Pi's GPIO interface. A low-level PID controller on each servo ensures that the commanded positions are tracked as accurately as possible given the hardware constraints (which usually requires us to recalibrate the robot before each use). The frequency of command updates will need to be adjusted to balance responsiveness with computational load on the Pi, but we're not quite there yet so that is a future problem to solve.

Perception & State Estimation

In terms of perception, the current system only has the IMU to work with. We are working on modeling the motor movement speed well enough to get rough estimates of joint positions and velocities based on the commanded positions and time elapsed, but this introduces a lot of uncertainty. Terrain slowing down the joints will cause significant drift in these estimates over time. To help mitigate this, we are exploring the use of an electrical solution measuring the resistance of the servo to estimate load, which could give us some insight into foot contact events and be fed directly to the neural policy as an additional input. Still very much a work in progress - having actual motor encoders would make this way easier, but we're trying to work within the constraints of our current hardware.

Training Process & Results

Training Methodology

We are using Proximal Policy Optimization (PPO) as our RL algorithm, which has proven effective for robotics locomotion training and is integrated into Genesis extremely conveniently.

Our training utilizes a teacher-student setup: the "privileged" teacher model has access to full state information in simulation (exact joint positions, velocities, foot contact sensors, etc.), while the "unprivileged" student model only has access to the same limited sensor data that the real robot will have (IMU data, commanded joint positions). The teacher model helps guide the student's learning process, improving overall performance of the student policy given its ability to train its walking behavior from information it would otherwise not have.

Our reward function is designed to encourage efficient and stable locomotion given any arbitrary [speed, direction, rotation speed] command. This includes terms for linear and angular velocity tracking, survival, and smoothness of motion. We also incorporated a reward function for energy efficiency inspired by this paper and utilized many of the same domain randomization techniques described therein.

To ensure effective sim-to-real transfer, we employ extensive domain randomization including:

- Random payload mass and position somewhere on the body
- Servo PID parameters
- Friction coefficients of the ground
- Sensor noise characteristics
- Gravity vector direction (to simulate inclines)

We also include a fixed command delay of one control step to simulate latency in the system.

Performance Results

Our training has shown promising results in simulation, with the robot learning to walk and navigate towards target locations effectively. Currently we are just working with the priveleged teacher model, which has been able to achieve stable locomotion at a variety of speeds and directions. The unprivileged student model is still in the process of being set up and connected, but we are optimistic about its potential given the success of the teacher. The energy efficiency rewards have also led to smoother and more natural transitions between types of gaits (like walking to trotting), which is encouraging for future work on more dynamic movement.

Genesis running ~100 simultaneous robot simulations for training!

Current Status & Ongoing Work

As of now, we have successfully set up the simulation environment and trained the privileged teacher model to achieve stable locomotion in Genesis. The next steps involve integrating the unprivileged student model into the training pipeline and beginning real-world testing on Servobot to evaluate sim-to-real transfer performance. This will require setting up the ROS2 inference node on the Raspberry Pi, which is my main focus for the moment.

I've learned a tremendous amount about reinforcement learning, robotics simulation, and real-time control systems through this project so far. We started with a substantially more rudimentary system running in pybullet, and building up my fundamental understanding in that environment before transitioning to Genesis has allowed me to really deepen my understanding of exactly how these systems work. The challenges of working with limited hardware and sensor data have also pushed me to think creatively about state estimation and control strategies, which has been incredibly rewarding.

Achievements To Date:

- Successfully set up simulation environment and trained a rudimentary model using Pybullet & SK-Learn
- Successfully set up simulation environment and trained a more complex model with RSL-RL & Genesis
- Observed development of dynamic gait patterns, transition between walking and trotting from energy optimization reward
- Remotely set up ROS2 environment on Servobot

Next Steps

My next steps are to address some lingering issues with the learned behavior in the simulation by tweaking the reward function and training parameters, integrate the unprivileged student model into the training pipeline, and set up the ROS2 nodes on Servobot to begin real-world testing. I'm also looking forward to collaborating more closely with our mechanical and electrical teams as they finalize the full robot design, which will open up new possibilities for perception and control strategies. Theoretically our training process should translate nicely from Servobot to Catbot, but there will undoubtedly be new challenges to tackle once we have the full hardware platform up and running! :)