Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning
Wang Zhicheng, Satoshi Yagi, Satoshi Yamamori, Jun Morimoto

TL;DR
This paper introduces an object-centric perception method using SAM2 and imitation learning to improve mobile manipulation's robustness and generalization across diverse orientations.
Contribution
The paper presents a novel SAM2-guided perception approach that incorporates orientation info, enabling mobile manipulators to perform tasks from varied angles.
Findings
Model outperforms Action Chunking Transformer in generalization.
Enhanced robustness in pick-and-place tasks across different orientations.
Demonstrated on a custom mobile manipulator with varied approach angles.
Abstract
Imitation learning for mobile manipulation is a key challenge in the field of robotic manipulation. However, current mobile manipulation frameworks typically decouple navigation and manipulation, executing manipulation only after reaching a certain location. This can lead to performance degradation when navigation is imprecise, especially due to misalignment in approach angles. To enable a mobile manipulator to perform the same task from diverse orientations, an essential capability for building general-purpose robotic models, we propose an object-centric method based on SAM2, a foundation model towards solving promptable visual segmentation in images, which incorporates manipulation orientation information into our model. Our approach enables consistent understanding of the same task from different orientations. We deploy the model on a custom-built mobile manipulator and evaluate it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Robotics and Sensor-Based Localization
