Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning
Jiajun Hu, Nuria Armengol Urpi, Jin Cheng, Stelian Coros

TL;DR
This paper introduces FB-MEBE, an online zero-shot RL algorithm that maximizes behavior entropy and uses regularization to produce natural, deployable policies for quadrupedal robots, improving exploration and performance.
Contribution
The paper proposes FB-MEBE, a novel online zero-shot RL method combining entropy-based exploration with behavior regularization for better real-world robot policies.
Findings
FB-MEBE outperforms other exploration strategies in simulated downstream tasks.
FB-MEBE produces natural behaviors suitable for direct hardware deployment.
The approach enhances exploration diversity and policy plausibility.
Abstract
Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pretraining dataset determines the performance of the recovered policies across tasks. However, pre-collecting a relevant, diverse dataset without prior knowledge of the downstream tasks of interest remains a challenge. In this work, we study zero-shot RL for quadrupedal control on real robotic systems, building upon the Forward-Backward (FB) algorithm. We observe that undirected exploration yields low-diversity data, leading to poor downstream performance and rendering policies impractical for direct hardware deployment. Therefore, we introduce FB-MEBE, an online zero-shot RL algorithm that combines an unsupervised behavior exploration strategy with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
