LocalBench: Benchmarking LLMs on County-Level Local Knowledge and Reasoning
Zihan Gao, Yifei Xu, Jacob Thebault-Spieker

TL;DR
LocalBench is a comprehensive benchmark designed to evaluate large language models on county-level local knowledge across the U.S., revealing significant limitations in current models' ability to handle hyper-local information and reasoning.
Contribution
This paper introduces the first benchmark specifically targeting county-level local knowledge for LLMs, using diverse data sources and a new conceptual framework.
Findings
Models perform poorly on narrative and numerical questions.
Larger models and web augmentation do not always improve accuracy.
Web search improves some models but decreases others' performance.
Abstract
Large language models (LLMs) have been widely evaluated on macro-scale geographic tasks, such as global factual recall, event summarization, and regional reasoning. Yet, their ability to handle hyper-local knowledge remains poorly understood. This gap is increasingly consequential as real-world applications, from civic platforms to community journalism, demand AI systems that can reason about neighborhood-specific dynamics, cultural narratives, and local governance. Existing benchmarks fall short in capturing this complexity, often relying on coarse-grained data or isolated references. We present LocalBench, the first benchmark designed to systematically evaluate LLMs on county-level local knowledge across the United States. Grounded in the Localness Conceptual Framework, LocalBench includes 14,782 validated question-answer pairs across 526 U.S. counties in 49 states, integrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Geographic Information Systems Studies · Data-Driven Disease Surveillance
