RAVine: Reality-Aligned Evaluation for Agentic Search
Yilong Xu, Xiang Long, Zhi Zheng, Jinhua Gao

TL;DR
RAVine is a new evaluation framework for agentic search that aligns better with realistic user scenarios, accurately assesses iterative search processes, and considers efficiency, thereby advancing the development of intelligent search systems.
Contribution
RAVine introduces a reality-aligned evaluation method for agentic search, focusing on multi-point queries, iterative process assessment, and fine-grained accuracy, which addresses limitations of existing benchmarks.
Findings
Benchmarking models with RAVine reveals new insights into their search behaviors.
RAVine's ground truth construction improves evaluation accuracy.
The framework highlights the importance of iterative process assessment in agentic search.
Abstract
Agentic search, as a more autonomous and adaptive paradigm of retrieval augmentation, is driving the evolution of intelligent search systems. However, existing evaluation frameworks fail to align well with the goals of agentic search. First, the complex queries commonly used in current benchmarks often deviate from realistic user search scenarios. Second, prior approaches tend to introduce noise when extracting ground truth for end-to-end evaluations, leading to distorted assessments at a fine-grained level. Third, most current frameworks focus solely on the quality of final answers, neglecting the evaluation of the iterative process inherent to agentic search. To address these limitations, we propose RAVine -- a Reality-Aligned eValuation framework for agentic LLMs with search. RAVine targets multi-point queries and long-form answers that better reflect user intents, and introduces an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Semantic Web and Ontologies · Artificial Intelligence in Games
