PALM: Enhanced Generalizability for Local Visuomotor Policies via Perception Alignment
Ruiyu Wang, Zheyu Zhuang, Danica Kragic, Florian T. Pokorny

TL;DR
PALM enhances the generalization of visuomotor policies by aligning perception to maintain invariant local actions across out-of-distribution scenarios, improving robustness in manipulation tasks.
Contribution
This paper introduces PALM, a modular perception alignment method that improves local visuomotor policy generalization without extra data or model modifications.
Findings
Limits OOD performance drops to 8% in simulation
Reduces real-world OOD performance drop to 24%
Outperforms baseline methods significantly
Abstract
Generalizing beyond the training domain in image-based behavior cloning remains challenging. Existing methods address individual axes of generalization, workspace shifts, viewpoint changes, and cross-embodiment transfer, yet they are typically developed in isolation and often rely on complex pipelines. We introduce PALM (Perception Alignment for Local Manipulation), which leverages the invariance of local action distributions between out-of-distribution (OOD) and demonstrated domains to address these OOD shifts concurrently, without additional input modalities, model changes, or data collection. PALM modularizes the manipulation policy into coarse global components and a local policy for fine-grained actions. We reduce the discrepancy between in-domain and OOD inputs at the local policy level by enforcing local visual focus and consistent proprioceptive representation, allowing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsZebrafish Biomedical Research Applications · Human Pose and Action Recognition · Reinforcement Learning in Robotics
