iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics

Preetam Prabhu Srikar Dammu; Arnav Palkhiwala; Tanya Roosta; Chirag Shah

arXiv:2603.04656·cs.CL·March 6, 2026

iAgentBench: Benchmarking Sensemaking Capabilities of Information-Seeking Agents on High-Traffic Topics

Preetam Prabhu Srikar Dammu, Arnav Palkhiwala, Tanya Roosta, Chirag Shah

PDF

Open Access

TL;DR

iAgentBench is a new benchmark designed to evaluate the ability of information-seeking agents to perform complex sensemaking tasks involving multiple sources, addressing limitations of existing QA benchmarks.

Contribution

The paper introduces iAgentBench, a dynamic, realistic benchmark that assesses higher-level information synthesis and reasoning in open-domain question answering systems.

Findings

01

Retrieval improves accuracy but is insufficient alone.

02

Existing benchmarks do not effectively measure multi-source sensemaking.

03

Evaluation of evidence use is crucial for understanding system capabilities.

Abstract

With the emergence of search-enabled generative QA systems, users are increasingly turning to tools that browse, aggregate, and reconcile evidence across multiple sources on their behalf. Yet many widely used QA benchmarks remain answerable by retrieving a single relevant passage, making them poorly suited for measuring cross-source sensemaking, such as integrating evidence, tracking causal links, and resolving dependencies across facets of a topic. We present iAgentBench, a dynamic ODQA benchmark that targets these higher-level information needs while keeping questions natural and grounded in realistic information-seeking behavior. iAgentBench draws seed topics from real-world attention signals and uses common user intent patterns to construct user-like questions whose answers require combining evidence from multiple sources, not just extracting a single snippet. Each instance is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Expert finding and Q&A systems