TL;DR
EVGeoQA introduces a challenging benchmark for evaluating large language models' ability to perform dynamic, multi-objective geo-spatial exploration in electric vehicle charging scenarios, highlighting their strengths and limitations.
Contribution
The paper presents EVGeoQA, a novel geo-spatial benchmark with a dual-objective, location-anchored design, and proposes GeoRover, a framework for assessing LLMs in complex exploration tasks.
Findings
LLMs effectively use tools for sub-tasks but struggle with long-range spatial exploration.
LLMs can summarize exploration trajectories to improve efficiency.
EVGeoQA serves as a challenging testbed for geo-spatial intelligence.
Abstract
While Large Language Models (LLMs) demonstrate remarkable reasoning capabilities, their potential for purpose-driven exploration in dynamic geo-spatial environments remains under-investigated. Existing Geo-Spatial Question Answering (GSQA) benchmarks predominantly focus on static retrieval, failing to capture the complexity of real-world planning that involves dynamic user locations and compound constraints. To bridge this gap, we introduce EVGeoQA, a novel benchmark built upon Electric Vehicle (EV) charging scenarios that features a distinct location-anchored and dual-objective design. Specifically, each query in EVGeoQA is explicitly bound to a user's real-time coordinate and integrates the dual objectives of a charging necessity and a co-located activity preference. To systematically assess models in such complex settings, we further propose GeoRover, a general evaluation framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
