Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Bin Han; Robert Wolfe; Anat Caspi; Bill Howe

arXiv:2508.05009·cs.AI·August 8, 2025

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Bin Han, Robert Wolfe, Anat Caspi, Bill Howe

PDF

TL;DR

This paper investigates the use of large language models for integrating complex urban spatial datasets, highlighting their reasoning strengths and weaknesses, and proposing a review-and-refine method to improve accuracy.

Contribution

It demonstrates the potential of LLMs for spatial data integration, introduces a review-and-refine approach, and discusses future directions for enhancing spatial reasoning capabilities.

Findings

01

LLMs can reason about environmental spatial relationships.

02

LLMs struggle with connecting macro-scale environments to computational geometry.

03

A review-and-refine method effectively corrects errors.

Abstract

We explore the application of large language models (LLMs) to empower domain experts in integrating large, heterogeneous, and noisy urban spatial datasets. Traditional rule-based integration methods are unable to cover all edge cases, requiring manual verification and repair. Machine learning approaches require collecting and labeling of large numbers of task-specific samples. In this study, we investigate the potential of LLMs for spatial data integration. Our analysis first considers how LLMs reason about environmental spatial relationships mediated by human experience, such as between roads and sidewalks. We show that while LLMs exhibit spatial reasoning capabilities, they struggle to connect the macro-scale environment with the relevant computational geometry tasks, often producing logically incoherent responses. But when provided relevant features, thereby reducing dependence on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.