BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands
Seongwon Cho, Daechul Ahn, Donghyun Shin, Hyeonbeom Choi, San Kim, Jonghyun Choi

TL;DR
BINDER is a dual-process framework that enables robots to adapt instantly to dynamic environments by combining strategic planning with continuous environment monitoring using multimodal large language models.
Contribution
It introduces a novel dual process approach that decouples planning from real-time monitoring, improving robustness and efficiency in open-vocabulary mobile manipulation.
Findings
BINDER outperforms state-of-the-art methods in success rate and efficiency.
It effectively handles dynamic object placement in real-world environments.
The framework demonstrates robust adaptation in three different real-world settings.
Abstract
Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates and causing cascading failures: overlooked objects, late error detection, and delayed replanning. To address this limitation, we propose BINDER (Bridging INstant and DEliberative Reasoning), a dual process framework that decouples strategic planning from continuous environment monitoring. Specifically, BINDER integrates a Deliberative Response Module (DRM, a multimodal LLM for task planning) with an Instant Response Module (IRM, a VideoLLM for continuous monitoring). The two modules play…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
