OmniD: Generalizable Robot Manipulation Policy via Image-Based BEV Representation

Jilei Mao; Jiarui Guan; Yingjuan Tang; Qirui Hu; Zhihang Li; Junjie Yu; Yongjie Mao; Yunzhe Sun; Shuang Liu; Xiaozhu Ju

arXiv:2508.11898·cs.RO·August 19, 2025

OmniD: Generalizable Robot Manipulation Policy via Image-Based BEV Representation

Jilei Mao, Jiarui Guan, Yingjuan Tang, Qirui Hu, Zhihang Li, Junjie Yu, Yongjie Mao, Yunzhe Sun, Shuang Liu, Xiaozhu Ju

PDF

Open Access

TL;DR

OmniD introduces a multi-view fusion framework that synthesizes image observations into a unified bird's-eye view representation, significantly improving generalization in robot manipulation tasks across various scenarios.

Contribution

The paper presents OmniD, a novel deformable attention-based multi-view fusion method that enhances 3D representation and out-of-distribution generalization in visuomotor policies.

Findings

01

Achieves 11% improvement in in-distribution tasks

02

Achieves 17% improvement in out-of-distribution tasks

03

Achieves 84% improvement in few-shot learning scenarios

Abstract

The visuomotor policy can easily overfit to its training datasets, such as fixed camera positions and backgrounds. This overfitting makes the policy perform well in the in-distribution scenarios but underperform in the out-of-distribution generalization. Additionally, the existing methods also have difficulty fusing multi-view information to generate an effective 3D representation. To tackle these issues, we propose Omni-Vision Diffusion Policy (OmniD), a multi-view fusion framework that synthesizes image observations into a unified bird's-eye view (BEV) representation. We introduce a deformable attention-based Omni-Feature Generator (OFG) to selectively abstract task-relevant features while suppressing view-specific noise and background distractions. OmniD achieves 11\%, 17\%, and 84\% average improvement over the best baseline model for in-distribution, out-of-distribution, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Robot Manipulation and Learning