Warehouse Spatial Question Answering with LLM Agent

Hsiang-Wei Huang; Jen-Hao Cheng; Kuang-Ming Chen; Cheng-Yen Yang; Bahaa Alattar; Yi-Ru Lin; Pyongkun Kim; Sangwon Kim; Kwangju Kim; Chung-I Huang; Jenq-Neng Hwang

arXiv:2507.10778·cs.CV·August 15, 2025

Warehouse Spatial Question Answering with LLM Agent

Hsiang-Wei Huang, Jen-Hao Cheng, Kuang-Ming Chen, Cheng-Yen Yang, Bahaa Alattar, Yi-Ru Lin, Pyongkun Kim, Sangwon Kim, Kwangju Kim, Chung-I Huang, Jenq-Neng Hwang

PDF

1 Repo

TL;DR

This paper introduces a data-efficient LLM agent system capable of complex spatial reasoning in warehouse environments, outperforming existing models in accuracy and efficiency for spatial question answering tasks.

Contribution

The paper presents a novel LLM agent framework with integrated tools for advanced spatial reasoning in warehouse scenarios, reducing reliance on extensive fine-tuning.

Findings

01

High accuracy in object retrieval, counting, and distance estimation

02

Effective spatial reasoning in complex indoor warehouse scenarios

03

Outperforms existing methods on the 2025 AI City Challenge dataset

Abstract

Spatial understanding has been a challenging task for existing Multi-modal Large Language Models~(MLLMs). Previous methods leverage large-scale MLLM finetuning to enhance MLLM's spatial understanding ability. In this paper, we present a data-efficient approach. We propose a LLM agent system with strong and advanced spatial reasoning ability, which can be used to solve the challenging spatial question answering task in complex indoor warehouse scenarios. Our system integrates multiple tools that allow the LLM agent to conduct spatial reasoning and API tools interaction to answer the given complicated spatial question. Extensive evaluations on the 2025 AI City Challenge Physical AI Spatial Intelligence Warehouse dataset demonstrate that our system achieves high accuracy and efficiency in tasks such as object retrieval, counting, and distance estimation. The code is available at:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hsiangwei0903/spatialagent
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.