OmniD: Generalizable Robot Manipulation Policy via Image-Based BEV Representation
Jilei Mao, Jiarui Guan, Yingjuan Tang, Qirui Hu, Zhihang Li, Junjie Yu, Yongjie Mao, Yunzhe Sun, Shuang Liu, Xiaozhu Ju

TL;DR
OmniD introduces a multi-view fusion framework that synthesizes image observations into a unified bird's-eye view representation, significantly improving generalization in robot manipulation tasks across various scenarios.
Contribution
The paper presents OmniD, a novel deformable attention-based multi-view fusion method that enhances 3D representation and out-of-distribution generalization in visuomotor policies.
Findings
Achieves 11% improvement in in-distribution tasks
Achieves 17% improvement in out-of-distribution tasks
Achieves 84% improvement in few-shot learning scenarios
Abstract
The visuomotor policy can easily overfit to its training datasets, such as fixed camera positions and backgrounds. This overfitting makes the policy perform well in the in-distribution scenarios but underperform in the out-of-distribution generalization. Additionally, the existing methods also have difficulty fusing multi-view information to generate an effective 3D representation. To tackle these issues, we propose Omni-Vision Diffusion Policy (OmniD), a multi-view fusion framework that synthesizes image observations into a unified bird's-eye view (BEV) representation. We introduce a deformable attention-based Omni-Feature Generator (OFG) to selectively abstract task-relevant features while suppressing view-specific noise and background distractions. OmniD achieves 11\%, 17\%, and 84\% average improvement over the best baseline model for in-distribution, out-of-distribution, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Robot Manipulation and Learning
