OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
Mingjie Pan, Jiyao Zhang, Tianshu Wu, Yinghao Zhao, Wenlong Gao, Hao, Dong

TL;DR
OmniManip introduces an object-centric approach that leverages spatial constraints and interaction primitives to enable general robotic manipulation without fine-tuning Vision-Language Models, achieving strong zero-shot generalization.
Contribution
The paper presents a novel object-centric representation and a dual closed-loop system that bridges high-level reasoning and low-level control for general manipulation tasks.
Findings
Strong zero-shot generalization across diverse tasks
Robust real-time control without VLM fine-tuning
Effective automation of large-scale simulation data generation
Abstract
The development of general robotic systems capable of manipulating in unstructured environments is a significant challenge. While Vision-Language Models(VLM) excel in high-level commonsense reasoning, they lack the fine-grained 3D spatial understanding required for precise manipulation tasks. Fine-tuning VLM on robotic datasets to create Vision-Language-Action Models(VLA) is a potential solution, but it is hindered by high data collection costs and generalization issues. To address these challenges, we propose a novel object-centric representation that bridges the gap between VLM's high-level reasoning and the low-level precision required for manipulation. Our key insight is that an object's canonical space, defined by its functional affordances, provides a structured and semantically meaningful way to describe interaction primitives, such as points and directions. These primitives act…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Robotic Path Planning Algorithms · Robot Manipulation and Learning
