BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands

Seongwon Cho; Daechul Ahn; Donghyun Shin; Hyeonbeom Choi; San Kim; Jonghyun Choi

arXiv:2511.22364·cs.RO·April 15, 2026

BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands

Seongwon Cho, Daechul Ahn, Donghyun Shin, Hyeonbeom Choi, San Kim, Jonghyun Choi

PDF

TL;DR

BINDER is a dual-process framework that enables robots to adapt instantly to dynamic environments by combining strategic planning with continuous environment monitoring using multimodal large language models.

Contribution

It introduces a novel dual process approach that decouples planning from real-time monitoring, improving robustness and efficiency in open-vocabulary mobile manipulation.

Findings

01

BINDER outperforms state-of-the-art methods in success rate and efficiency.

02

It effectively handles dynamic object placement in real-world environments.

03

The framework demonstrates robust adaptation in three different real-world settings.

Abstract

Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates and causing cascading failures: overlooked objects, late error detection, and delayed replanning. To address this limitation, we propose BINDER (Bridging INstant and DEliberative Reasoning), a dual process framework that decouples strategic planning from continuous environment monitoring. Specifically, BINDER integrates a Deliberative Response Module (DRM, a multimodal LLM for task planning) with an Instant Response Module (IRM, a VideoLLM for continuous monitoring). The two modules play…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.