PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms

Wei Wang; Tianyu Shi; Shuai Zhang; Boyang Xia; Zequn Xie; Chenyu Zeng; Qi Zhang; Lynn Ai; Yaqi Yu; Kaiming Zhang; Feiyue Tang

arXiv:2603.27476·cs.AI·March 31, 2026

PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms

Wei Wang, Tianyu Shi, Shuai Zhang, Boyang Xia, Zequn Xie, Chenyu Zeng, Qi Zhang, Lynn Ai, Yaqi Yu, Kaiming Zhang, Feiyue Tang

PDF

1 Repo

TL;DR

PeopleSearchBench introduces a comprehensive, open-source benchmark for evaluating AI-powered people search platforms across multiple real-world use cases using a factual verification pipeline.

Contribution

It presents a novel multi-dimensional benchmark with a criteria-grounded verification method for objective performance evaluation of people search systems.

Findings

01

Lessie outperforms other systems with 65.2 overall score.

02

Lessie achieves 100% task completion on all queries.

03

The verification pipeline has high human validation agreement (Cohen's kappa = 0.84).

Abstract

AI-powered people search platforms are increasingly used in recruiting, sales prospecting, and professional networking, yet no widely accepted benchmark exists for evaluating their performance. We introduce PeopleSearchBench, an open-source benchmark that compares four people search platforms on 119 real-world queries across four use cases: corporate recruiting, B2B sales prospecting, expert search with deterministic answers, and influencer/KOL discovery. A key contribution is Criteria-Grounded Verification, a factual relevance pipeline that extracts explicit, verifiable criteria from each query and uses live web search to determine whether returned people satisfy them. This produces binary relevance judgments grounded in factual verification rather than subjective holistic LLM-as-judge scores. We evaluate systems on three dimensions: Relevance Precision (padded nDCG@10), Effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.