Learning to Race in Minutes: Infoprop Dyna on the Mini Wheelbot

Devdutt Subhasish; Henrik Hose; and Sebastian Trimpe

arXiv:2605.01096·cs.LG·May 5, 2026

Learning to Race in Minutes: Infoprop Dyna on the Mini Wheelbot

Devdutt Subhasish, Henrik Hose, and Sebastian Trimpe

PDF

TL;DR

This paper demonstrates that Infoprop Dyna, a model-based reinforcement learning framework, enables a Mini Wheelbot to learn to race in just 11 minutes of real-world interaction without relying on simulators.

Contribution

It shows that Infoprop Dyna can directly learn effective control policies from real-world data for an underactuated robot, bypassing the need for simulators.

Findings

01

Mini Wheelbot learns to race in 11 minutes of real-world experience.

02

Infoprop Dyna successfully handles the robot's nonlinear, unstable dynamics.

03

The approach eliminates reliance on physics-based simulators for training.

Abstract

Reinforcement Learning (RL) has the potential to enable robots with fast, nonlinear, and unstable dynamics to reach the limits of their performance. However, most recent advances rely on carefully designed physics-based simulators and domain randomization to achieve successful sim-to-real transfer within reasonable wall-clock time. In this work, we bypass the need for such simulators and demonstrate that Infoprop Dyna, a state-of-the-art uncertainty-aware model-based reinforcement learning (MBRL) framework, can enable robots to learn directly from real-world interactions. Using Infoprop Dyna, the Mini Wheelbot, an underactuated unicycle robot, learns to race around a track within 11 minutes of real-world experience.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.