Dual-Granularity Contrastive Reward via Generated Episodic Guidance for Efficient Embodied RL
Xin Liu, Yixuan Li, Yuhui Chen, Yuxing Qin, Haoran Li, Dongbin Zhao

TL;DR
This paper introduces DEG, a novel reward framework that uses generated episodic guidance from video models to improve sample efficiency in embodied RL without human annotations.
Contribution
DEG leverages large video generation models and expert videos to create dense, task-specific rewards, enhancing RL efficiency and stability without extensive supervision.
Findings
DEG accelerates discovery of success in diverse tasks
It improves policy stability and convergence
Effective in both simulation and real-world environments
Abstract
Designing suitable rewards poses a significant challenge in reinforcement learning (RL), especially for embodied manipulation. Trajectory success rewards are suitable for human judges or model fitting, but the sparsity severely limits RL sample efficiency. While recent methods have effectively improved RL via dense rewards, they rely heavily on high-quality human-annotated data or abundant expert supervision. To tackle these issues, this paper proposes Dual-granularity contrastive reward via generated Episodic Guidance (DEG), a novel framework to seek sample-efficient dense rewards without requiring human annotations or extensive supervision. Leveraging the prior knowledge of large video generation models, DEG only needs a small number of expert videos for domain adaptation to generate dedicated task guidance for each RL episode. Then, the proposed dual-granularity reward that balances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
