LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Hang He; Chuhuai Yue; Chengqi Dong; Mingxue Tian; Hao Chen; Zhenfeng Liu; Jiajun Chai; Xiaohan Wang; Yufei Zhang; Qun Liao; Guojun Yin; Wei Lin; Chengcheng Wan; Haiying Sun; Ting Su

arXiv:2512.07436·cs.AI·January 14, 2026

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Hang He, Chuhuai Yue, Chengqi Dong, Mingxue Tian, Hao Chen, Zhenfeng Liu, Jiajun Chai, Xiaohan Wang, Yufei Zhang, Qun Liao, Guojun Yin, Wei Lin, Chengcheng Wan, Haiying Sun, Ting Su

PDF

Open Access 1 Datasets

TL;DR

This paper introduces LocalSearchBench, a comprehensive benchmark for agentic search in local life services, highlighting the challenges faced by current models in multi-hop reasoning and domain-specific tasks.

Contribution

It presents the first large-scale benchmark dataset and environment for agentic search in local life services, addressing a gap in domain-specific multi-step reasoning evaluation.

Findings

01

State-of-the-art LRMs perform poorly on LocalSearchBench

02

Models struggle with completeness and faithfulness in local service queries

03

Highlighting the need for domain-specific training and benchmarks

Abstract

Recent advances in large reasoning models LRMs have enabled agentic search systems to perform complex multi-step reasoning across multiple sources. However, most studies focus on general information retrieval and rarely explores vertical domains with unique challenges. In this work, we focus on local life services and introduce LocalSearchBench, which encompass diverse and complex business scenarios. Real-world queries in this domain are often ambiguous and require multi-hop reasoning across merchants and products, remaining challenging and not fully addressed. As the first comprehensive benchmark for agentic search in local life services, LocalSearchBench comprises a database of over 1.3M merchant entries across 6 service categories and 9 major cities, and 900 multi-hop QA tasks from real user queries that require multi-step reasoning. We also developed LocalPlayground, a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

localsearchbench/localsearchbench
dataset· 33 dl
33 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Multi-Agent Systems and Negotiation