CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space

Yong Zhao; Kai Xu; Zhengqiu Zhu; Yue Hu; Zhiheng Zheng; Yingfeng Chen; Yatai Ji; Chen Gao; Yong Li; Jincai Huang

arXiv:2502.12532·cs.AI·May 23, 2025

CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space

Yong Zhao, Kai Xu, Zhengqiu Zhu, Yue Hu, Zhiheng Zheng, Yingfeng Chen, Yatai Ji, Chen Gao, Yong Li, Jincai Huang

PDF

Open Access 2 Repos 1 Video

TL;DR

CityEQA introduces a new urban environment question-answering benchmark and a hierarchical agent architecture that significantly improves performance in city space exploration and reasoning tasks.

Contribution

The paper presents CityEQA, a novel urban question-answering benchmark, and a hierarchical agent model, PMA, for effective long-horizon planning and spatial reasoning in city environments.

Findings

01

PMA achieves 60.7% of human-level accuracy.

02

CityEQA dataset contains 1,412 annotated tasks.

03

Hierarchical planning improves urban spatial reasoning.

Abstract

Embodied Question Answering (EQA) has primarily focused on indoor environments, leaving the complexities of urban settings-spanning environment, action, and perception-largely unexplored. To bridge this gap, we introduce CityEQA, a new task where an embodied agent answers open-vocabulary questions through active exploration in dynamic city spaces. To support this task, we present CityEQA-EC, the first benchmark dataset featuring 1,412 human-annotated tasks across six categories, grounded in a realistic 3D urban simulator. Moreover, we propose Planner-Manager-Actor (PMA), a novel agent tailored for CityEQA. PMA enables long-horizon planning and hierarchical task execution: the Planner breaks down the question answering into sub-tasks, the Manager maintains an object-centric cognitive map for spatial reasoning during the process control, and the specialized Actors handle navigation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space· underline

Taxonomy

TopicsGeographic Information Systems Studies · Topic Modeling