Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning

Jiajun Hu; Nuria Armengol Urpi; Jin Cheng; Stelian Coros

arXiv:2603.25464·cs.LG·March 27, 2026

Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning

Jiajun Hu, Nuria Armengol Urpi, Jin Cheng, Stelian Coros

PDF

Open Access

TL;DR

This paper introduces FB-MEBE, an online zero-shot RL algorithm that maximizes behavior entropy and uses regularization to produce natural, deployable policies for quadrupedal robots, improving exploration and performance.

Contribution

The paper proposes FB-MEBE, a novel online zero-shot RL method combining entropy-based exploration with behavior regularization for better real-world robot policies.

Findings

01

FB-MEBE outperforms other exploration strategies in simulated downstream tasks.

02

FB-MEBE produces natural behaviors suitable for direct hardware deployment.

03

The approach enhances exploration diversity and policy plausibility.

Abstract

Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pretraining dataset determines the performance of the recovered policies across tasks. However, pre-collecting a relevant, diverse dataset without prior knowledge of the downstream tasks of interest remains a challenge. In this work, we study $online$ zero-shot RL for quadrupedal control on real robotic systems, building upon the Forward-Backward (FB) algorithm. We observe that undirected exploration yields low-diversity data, leading to poor downstream performance and rendering policies impractical for direct hardware deployment. Therefore, we introduce FB-MEBE, an online zero-shot RL algorithm that combines an unsupervised behavior exploration strategy with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning