TADPO: Reinforcement Learning Goes Off-road
Zhouchonghao Wu, Raymond Song, Vedant Mundheda, Luis E. Navarro-Serment, Christof Schoenborn, Jeff Schneider

TL;DR
This paper introduces TADPO, a novel reinforcement learning method that enhances off-road autonomous driving by enabling high-speed, zero-shot sim-to-real transfer on a full-scale vehicle navigating complex terrains.
Contribution
The paper presents TADPO, a new policy gradient approach extending PPO with off-policy guidance, and demonstrates its successful deployment on a real off-road vehicle.
Findings
Effective zero-shot sim-to-real transfer achieved
First RL-based policy deployed on a full-scale off-road vehicle
High-speed navigation in complex terrains demonstrated
Abstract
Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain with uncertain and diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Reinforcement Learning (RL) offers a promising solution by learning control policies directly from interaction. However, because off-road driving is a long-horizon task with low-signal rewards, standard RL methods are challenging to apply in this setting. We introduce TADPO, a novel policy gradient formulation that extends Proximal Policy Optimization (PPO), leveraging off-policy trajectories for teacher guidance and on-policy trajectories for student exploration. Building on this, we develop a vision-based, end-to-end RL system for high-speed off-road driving, capable of navigating extreme slopes and obstacle-rich terrain. We demonstrate our performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Robotic Locomotion and Control
