LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services
Hang He, Chuhuai Yue, Chengqi Dong, Mingxue Tian, Hao Chen, Zhenfeng Liu, Jiajun Chai, Xiaohan Wang, Yufei Zhang, Qun Liao, Guojun Yin, Wei Lin, Chengcheng Wan, Haiying Sun, Ting Su

TL;DR
This paper introduces LocalSearchBench, a comprehensive benchmark for agentic search in local life services, highlighting the challenges faced by current models in multi-hop reasoning and domain-specific tasks.
Contribution
It presents the first large-scale benchmark dataset and environment for agentic search in local life services, addressing a gap in domain-specific multi-step reasoning evaluation.
Findings
State-of-the-art LRMs perform poorly on LocalSearchBench
Models struggle with completeness and faithfulness in local service queries
Highlighting the need for domain-specific training and benchmarks
Abstract
Recent advances in large reasoning models LRMs have enabled agentic search systems to perform complex multi-step reasoning across multiple sources. However, most studies focus on general information retrieval and rarely explores vertical domains with unique challenges. In this work, we focus on local life services and introduce LocalSearchBench, which encompass diverse and complex business scenarios. Real-world queries in this domain are often ambiguous and require multi-hop reasoning across merchants and products, remaining challenging and not fully addressed. As the first comprehensive benchmark for agentic search in local life services, LocalSearchBench comprises a database of over 1.3M merchant entries across 6 service categories and 9 major cities, and 900 multi-hop QA tasks from real user queries that require multi-step reasoning. We also developed LocalPlayground, a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Multi-Agent Systems and Negotiation
