TL;DR
This paper introduces Policy Contrastive Decoding (PCD), a training-free method that improves robotic policies by emphasizing object-relevant visual cues, enhancing generalization in simulation and real-world tasks.
Contribution
The paper proposes a novel, training-free contrastive decoding approach that can be plugged into existing robot policies to improve their focus and generalization without finetuning.
Findings
PCD improves the performance of existing policies in simulation and real-world environments.
PCD enhances the state-of-the-art policy $\\pi_0$ by 8.9% in simulation.
PCD boosts real-world policy performance by 108%.
Abstract
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
