Dual-Granularity Contrastive Reward via Generated Episodic Guidance for Efficient Embodied RL

Xin Liu; Yixuan Li; Yuhui Chen; Yuxing Qin; Haoran Li; Dongbin Zhao

arXiv:2602.12636·cs.LG·February 16, 2026

Dual-Granularity Contrastive Reward via Generated Episodic Guidance for Efficient Embodied RL

Xin Liu, Yixuan Li, Yuhui Chen, Yuxing Qin, Haoran Li, Dongbin Zhao

PDF

Open Access

TL;DR

This paper introduces DEG, a novel reward framework that uses generated episodic guidance from video models to improve sample efficiency in embodied RL without human annotations.

Contribution

DEG leverages large video generation models and expert videos to create dense, task-specific rewards, enhancing RL efficiency and stability without extensive supervision.

Findings

01

DEG accelerates discovery of success in diverse tasks

02

It improves policy stability and convergence

03

Effective in both simulation and real-world environments

Abstract

Designing suitable rewards poses a significant challenge in reinforcement learning (RL), especially for embodied manipulation. Trajectory success rewards are suitable for human judges or model fitting, but the sparsity severely limits RL sample efficiency. While recent methods have effectively improved RL via dense rewards, they rely heavily on high-quality human-annotated data or abundant expert supervision. To tackle these issues, this paper proposes Dual-granularity contrastive reward via generated Episodic Guidance (DEG), a novel framework to seek sample-efficient dense rewards without requiring human annotations or extensive supervision. Leveraging the prior knowledge of large video generation models, DEG only needs a small number of expert videos for domain adaptation to generate dedicated task guidance for each RL episode. Then, the proposed dual-granularity reward that balances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis