MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal   Large Language Models

Angus Fung; Aaron Hao Tan; Haitong Wang; Beno Benhabib; Goldie Nejat

arXiv:2412.00103·cs.RO·December 3, 2024·2 cites

MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models

Angus Fung, Aaron Hao Tan, Haitong Wang, Beno Benhabib, Goldie Nejat

PDF

Open Access

TL;DR

MLLM-Search introduces a zero-shot, multimodal large language model-based architecture for autonomous robot search of people in dynamic, real-world environments, leveraging spatial understanding and semantic reasoning.

Contribution

It presents a novel visual prompting method and spatial chain-of-thought prompting to enhance robot search capabilities without prior knowledge.

Findings

01

Outperforms existing search methods in efficiency

02

Successfully generalizes to unseen environments

03

Validated through extensive 3D and real-world experiments

Abstract

Robotic search of people in human-centered environments, including healthcare settings, is challenging as autonomous robots need to locate people without complete or any prior knowledge of their schedules, plans or locations. Furthermore, robots need to be able to adapt to real-time events that can influence a person's plan in an environment. In this paper, we present MLLM-Search, a novel zero-shot person search architecture that leverages multimodal large language models (MLLM) to address the mobile robot problem of searching for a person under event-driven scenarios with varying user schedules. Our approach introduces a novel visual prompting method to provide robots with spatial understanding of the environment by generating a spatially grounded waypoint map, representing navigable waypoints by a topological graph and regions by semantic labels. This is incorporated into a MLLM with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling