What Matters for Simulation to Online Reinforcement Learning on Real Robots

Yarden As; Dhruva Tirumala; Ren\'e Zurbr\"ugg; Chenhao Li; Stelian Coros; Andreas Krause; Markus Wulfmeier

arXiv:2602.20220·cs.RO·February 25, 2026

What Matters for Simulation to Online Reinforcement Learning on Real Robots

Yarden As, Dhruva Tirumala, Ren\'e Zurbr\"ugg, Chenhao Li, Stelian Coros, Andreas Krause, Markus Wulfmeier

PDF

Open Access

TL;DR

This paper systematically studies the design choices that enable successful online reinforcement learning on real robots, providing empirical insights to improve deployment stability and reduce engineering effort.

Contribution

It offers the first large-sample empirical analysis of algorithmic and system design choices for online RL on physical robots, identifying effective practices.

Findings

01

Some common defaults can be harmful

02

Robust design choices lead to stable learning

03

Empirical results across multiple robots and tasks

Abstract

We investigate what specific design choices enable successful online reinforcement learning (RL) on physical robots. Across 100 real-world training runs on three distinct robotic platforms, we systematically ablate algorithmic, systems, and experimental decisions that are typically left implicit in prior work. We find that some widely used defaults can be harmful, while a set of robust, readily adopted design choices within standard RL practice yield stable learning across tasks and hardware. These results provide the first large-sample empirical study of such design choices, enabling practitioners to deploy online RL with lower engineering effort.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Advanced Bandit Algorithms Research