From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning
Zhanyi Sun, Shuran Song

TL;DR
DICE-RL is a novel reinforcement learning framework that refines pretrained robot policies into high-performing skills by using distribution contraction, achieving efficient, stable, and sample-efficient mastery of complex manipulation tasks from pixel inputs.
Contribution
It introduces DICE-RL, a new method combining distribution contraction with off-policy RL to improve pretrained policies for complex robotic skills from high-dimensional data.
Findings
DICE-RL significantly improves performance on manipulation tasks.
The method demonstrates high stability and sample efficiency.
It enables complex skill mastery directly from pixel inputs.
Abstract
We introduce Distribution Contractive Reinforcement Learning (DICE-RL), a framework that uses reinforcement learning (RL) as a "distribution contraction" operator to refine pretrained generative robot policies. DICE-RL turns a pretrained behavior prior into a high-performing "pro" policy by amplifying high-success behaviors from online feedback. We pretrain a diffusion- or flow-based policy for broad behavioral coverage, then finetune it with a stable, sample-efficient residual off-policy RL framework that combines selective behavior regularization with value-guided action selection. Extensive experiments and analyses show that DICE-RL reliably improves performance with strong stability and sample efficiency. It enables mastery of complex long-horizon manipulation skills directly from high-dimensional pixel inputs, both in simulation and on a real robot. Project website:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
