Precise Mobile Manipulation of Small Everyday Objects
Arjun Gupta, Rishik Sathua, Saurabh Gupta

TL;DR
This paper introduces SVM, a vision-based closed-loop framework enabling mobile robots to precisely manipulate small objects in novel environments, significantly improving success rates over baseline methods.
Contribution
The paper presents SVM, a novel framework that integrates vision foundation models and out-painting techniques for precise, semantic, and reliable manipulation of small objects in diverse environments.
Findings
Achieves 71% zero-shot success rate in real-world tests.
Outperforms open-loop control by 42% in success rate.
Surpasses imitation learning baseline by 50% in success rate.
Abstract
Many everyday mobile manipulation tasks require precise interaction with small objects, such as grasping a knob to open a cabinet or pressing a light switch. In this paper, we develop Servoing with Vision Models (SVM), a closed-loop framework that enables a mobile manipulator to tackle such precise tasks involving the manipulation of small objects. SVM uses state-of-the-art vision foundation models to generate 3D targets for visual servoing to enable diverse tasks in novel environments. Naively doing so fails because of occlusion by the end-effector. SVM mitigates this using vision models that out-paint the end-effector, thereby significantly enhancing target localization. We demonstrate that aided by out-painting methods, open-vocabulary object detectors can serve as a drop-in module for SVM to seek semantic targets (e.g. knobs) and point tracking methods can help SVM reliably pursue…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Interactive and Immersive Displays · Robot Manipulation and Learning
MethodsContext Aggregated Bi-lateral Network for Semantic Segmentation · Support Vector Machine
