HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
Zhenhao Shen, Zeming Yang, Yue Chen, Yuran Wang, Shengqiang Xu, Mingleyang Li, Hao Dong, Ruihai Wu

TL;DR
HeteroGenManip introduces a two-stage, task-conditioned framework for robotic manipulation of heterogeneous objects, improving generalization and reducing error accumulation in complex, cross-category interactions.
Contribution
It proposes a novel decoupled approach with a grasp module and a category-specific diffusion policy, enhancing robustness and generalization in manipulation tasks.
Findings
Achieves 31% performance improvement in simulation tasks.
Attains 36.7% gain across four real-world interaction tasks.
Demonstrates robust intra-category shape and pose generalization.
Abstract
Generalizable manipulation involving cross-type object interactions is a critical yet challenging capability in robotics. To reliably accomplish such tasks, robots must address two fundamental challenges: "where to manipulate" (contact point localization) and "how to manipulate" (subsequent interaction trajectory planning). Existing foundation-model-based approaches often adopt end-to-end learning that obscures the distinction between these stages, exacerbating error accumulation in long-horizon tasks. Furthermore, they typically rely on a single uniform model, which fails to capture the diverse, category-specific features required for heterogeneous objects. To overcome these limitations, we propose HeteroGenManip, a task-conditioned, two-stage framework designed to decouple initial grasp from complex interaction execution. First, Foundation-Correspondence-Guided Grasp module leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
