From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning

Zhanyi Sun; Shuran Song

arXiv:2603.10263·cs.RO·March 12, 2026

From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning

Zhanyi Sun, Shuran Song

PDF

Open Access 1 Models 2 Datasets

TL;DR

DICE-RL is a novel reinforcement learning framework that refines pretrained robot policies into high-performing skills by using distribution contraction, achieving efficient, stable, and sample-efficient mastery of complex manipulation tasks from pixel inputs.

Contribution

It introduces DICE-RL, a new method combining distribution contraction with off-policy RL to improve pretrained policies for complex robotic skills from high-dimensional data.

Findings

01

DICE-RL significantly improves performance on manipulation tasks.

02

The method demonstrates high stability and sample efficiency.

03

It enables complex skill mastery directly from pixel inputs.

Abstract

We introduce Distribution Contractive Reinforcement Learning (DICE-RL), a framework that uses reinforcement learning (RL) as a "distribution contraction" operator to refine pretrained generative robot policies. DICE-RL turns a pretrained behavior prior into a high-performing "pro" policy by amplifying high-success behaviors from online feedback. We pretrain a diffusion- or flow-based policy for broad behavioral coverage, then finetune it with a stable, sample-efficient residual off-policy RL framework that combines selective behavior regularization with value-guided action selection. Extensive experiments and analyses show that DICE-RL reliably improves performance with strong stability and sample efficiency. It enables mastery of complex long-horizon manipulation skills directly from high-dimensional pixel inputs, both in simulation and on a real robot. Project website:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
wintermelontree/robomimic-pretrain-checkpoints
model· ♡ 2
♡ 2

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning