FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control

Younggyo Seo; Carmelo Sferrazza; Haoran Geng; Michal Nauman; Zhao-Heng Yin; Pieter Abbeel

arXiv:2505.22642·cs.RO·June 3, 2025

FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control

Younggyo Seo, Carmelo Sferrazza, Haoran Geng, Michal Nauman, Zhao-Heng Yin, Pieter Abbeel

PDF

Open Access

TL;DR

FastTD3 is a streamlined reinforcement learning algorithm that drastically reduces training time for humanoid robots, achieving high performance in under 3 hours using simple modifications and efficient training techniques.

Contribution

We introduce FastTD3, a novel RL algorithm that combines simplicity and speed, enabling rapid training of humanoid control policies with minimal computational resources.

Findings

01

Solves HumanoidBench tasks in under 3 hours on a single GPU

02

Maintains training stability with simple modifications

03

Provides an accessible implementation for robotics RL research

Abstract

Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks. In this report, we introduce FastTD3, a simple, fast, and capable RL algorithm that significantly speeds up training for humanoid robots in popular suites such as HumanoidBench, IsaacLab, and MuJoCo Playground. Our recipe is remarkably simple: we train an off-policy TD3 agent with several modifications -- parallel simulation, large-batch updates, a distributional critic, and carefully tuned hyperparameters. FastTD3 solves a range of HumanoidBench tasks in under 3 hours on a single A100 GPU, while remaining stable during training. We also provide a lightweight and easy-to-use implementation of FastTD3 to accelerate RL research in robotics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety

MethodsAdam · Dense Connections · Experience Replay · Target Policy Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Twin Delayed Deep Deterministic