TL;DR
PeopleSearchBench introduces a comprehensive, open-source benchmark for evaluating AI-powered people search platforms across multiple real-world use cases using a factual verification pipeline.
Contribution
It presents a novel multi-dimensional benchmark with a criteria-grounded verification method for objective performance evaluation of people search systems.
Findings
Lessie outperforms other systems with 65.2 overall score.
Lessie achieves 100% task completion on all queries.
The verification pipeline has high human validation agreement (Cohen's kappa = 0.84).
Abstract
AI-powered people search platforms are increasingly used in recruiting, sales prospecting, and professional networking, yet no widely accepted benchmark exists for evaluating their performance. We introduce PeopleSearchBench, an open-source benchmark that compares four people search platforms on 119 real-world queries across four use cases: corporate recruiting, B2B sales prospecting, expert search with deterministic answers, and influencer/KOL discovery. A key contribution is Criteria-Grounded Verification, a factual relevance pipeline that extracts explicit, verifiable criteria from each query and uses live web search to determine whether returned people satisfy them. This produces binary relevance judgments grounded in factual verification rather than subjective holistic LLM-as-judge scores. We evaluate systems on three dimensions: Relevance Precision (padded nDCG@10), Effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
