IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation
Yifan Li, Lichi Li, Anh Dao, Xinyu Zhou, Yicheng Qiao, Zheda Mai, Daeun Lee, Zichen Chen, Zhen Tan, Mohit Bansal, Yu Kong

TL;DR
IndustryNav introduces a novel benchmark for evaluating embodied agents' spatial reasoning in dynamic industrial environments, highlighting current models' limitations in path planning, collision avoidance, and active exploration.
Contribution
This work presents IndustryNav, the first dynamic industrial navigation benchmark with high-fidelity scenarios, new safety metrics, and an evaluation of state-of-the-art VLLMs in complex environments.
Findings
Closed-source models outperform open-source ones.
All models struggle with robust path planning.
Agents show deficiencies in collision avoidance and exploration.
Abstract
While Visual Large Language Models (VLLMs) show great promise as embodied agents, they continue to face substantial challenges in spatial reasoning. Existing embodied benchmarks largely focus on passive, static household environments and evaluate only isolated capabilities, failing to capture holistic performance in dynamic, real-world complexity. To fill this gap, we present IndustryNav, the first dynamic industrial navigation benchmark for active spatial reasoning. IndustryNav leverages 12 manually created, high-fidelity Unity warehouse scenarios featuring dynamic objects and human movement. Our evaluation employs a PointGoal navigation pipeline that effectively combines egocentric vision with global odometry to assess holistic local-global planning. Crucially, we introduce the "collision rate" and "warning rate" metrics to measure safety-oriented behaviors and distance estimation. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Social Robot Interaction and HRI
