Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

Tzu-Jung Lin; Jia-Fong Yeh; Hung-Ting Su; Chung-Yi Lin; Yi-Ting Chen; Winston H. Hsu

arXiv:2511.06240·cs.RO·January 6, 2026

Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation

Tzu-Jung Lin, Jia-Fong Yeh, Hung-Ting Su, Chung-Yi Lin, Yi-Ting Chen, Winston H. Hsu

PDF

Open Access

TL;DR

This paper introduces a zero-shot, affordance-guided approach for base placement in mobile manipulation, combining vision-language models with geometric reasoning to improve task success across diverse scenarios.

Contribution

It presents a novel framework that integrates semantic affordances with geometric constraints for open-vocabulary mobile manipulation, outperforming existing methods.

Findings

01

Achieves 85% success rate on five manipulation tasks.

02

Outperforms classical geometric and VLM-based planners.

03

Demonstrates effective multimodal reasoning for generalizable planning.

Abstract

In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-based regions without considering affordances, resulting in frequent manipulation failures. We propose Affordance-Guided Coarse-to-Fine Exploration, a zero-shot framework for base placement that integrates semantic understanding from vision-language models (VLMs) with geometric feasibility through an iterative optimization process. Our method constructs cross-modal representations, namely Affordance RGB and Obstacle Map+, to align semantics with spatial context. This enables reasoning that extends beyond the egocentric limitations of RGB perception. To ensure interaction is guided by task-relevant affordances, we leverage coarse semantic priors from VLMs to guide the search toward task-relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Multimodal Machine Learning Applications