Reset-Free Reinforcement Learning for Real-World Agile Driving: An Empirical Study
Kohei Honda, Hirotaka Hosogaya

TL;DR
This empirical study investigates reset-free reinforcement learning for real-world agile driving using a scaled vehicle, highlighting challenges in sim-to-real transfer and comparing multiple RL algorithms with residual learning.
Contribution
It systematically compares RL algorithms in real-world agile driving, revealing the limitations of residual learning and the gap between simulation and real-world performance.
Findings
SAC with residual learning performs best in simulation.
TD-MPC2 outperforms MPPI baseline on the physical platform.
Residual learning benefits do not transfer well from simulation to real-world.
Abstract
This paper presents an empirical study of reset-free reinforcement learning (RL) for real-world agile driving, in which a physical 1/10-scale vehicle learns continuously on a slippery indoor track without manual resets. High-speed driving near the limits of tire friction is particularly challenging for learning-based methods because complex vehicle dynamics, actuation delays, and other unmodeled effects hinder both accurate simulation and direct sim-to-real transfer of learned policies. To enable autonomous training on a physical platform, we employ Model Predictive Path Integral control (MPPI) as both the reset policy and the base policy for residual learning, and systematically compare three representative RL algorithms, i.e., PPO, SAC, and TD-MPC2, with and without residual learning in simulation and real-world experiments. Our results reveal a clear gap between simulation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
