InCoM: Intent-Driven Perception and Structured Coordination for Mobile Manipulation
Jiahao Liu, Cui Wenbo, Zhongpu Xia, Haoran Li, Dongbin Zhao

TL;DR
InCoM introduces an intent-driven perception and structured coordination framework that improves mobile manipulation by dynamically reallocating perceptual attention and decoupling control actions, leading to significant performance gains.
Contribution
The paper presents a novel framework combining intent inference, multi-scale perceptual reweighting, and decoupled control modeling to enhance mobile manipulation capabilities.
Findings
InCoM achieves success rate improvements of over 23% in ManiSkill-HAB scenarios.
The framework maintains superior real-world task success rates compared to baselines.
Experimental results validate the effectiveness of intent-driven perception and decoupled control.
Abstract
Mobile manipulation is a fundamental capability for general-purpose robotic agents, requiring both coordinated control of the mobile base and manipulator and robust perception under dynamically changing viewpoints. However, existing approaches face two key challenges: strong coupling between base and arm actions complicates control optimization, and perceptual attention is often poorly allocated as viewpoints shift during mobile manipulation. We propose InCoM, an intent-driven perception and structured coordination framework for mobile manipulation. InCoM infers latent motion intent to dynamically reweight multi-scale perceptual features, enabling stage-adaptive allocation of perceptual attention. To support robust cross-modal perception, InCoM further incorporates a geometric-semantic structured alignment mechanism that enhances multimodal correspondence. On the control side, we design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
