Benchmarking Agents in Insurance Underwriting Environments
Amanda Dsouza, Ramya Ramakrishnan, Charles Dickens, Bhavishya Pohani, Christopher M Glaze

TL;DR
This paper introduces UNDERWRITE, a realistic insurance underwriting benchmark designed with domain experts, revealing critical gaps in current AI models' performance and highlighting the importance of expert-driven evaluation for enterprise readiness.
Contribution
The paper presents UNDERWRITE, a novel, expert-designed benchmark that incorporates real-world complexities like proprietary knowledge and noisy interfaces, addressing limitations of existing benchmarks.
Findings
Significant performance gaps between research models and enterprise needs.
Most accurate models are not the most efficient or reliable.
Hallucination of domain knowledge persists despite tool access.
Abstract
As AI agents integrate into enterprise applications, their evaluation demands benchmarks that reflect the complexity of real-world operations. Instead, existing benchmarks overemphasize open-domains such as code, use narrow accuracy metrics, and lack authentic complexity. We present UNDERWRITE, an expert-first, multi-turn insurance underwriting benchmark designed in close collaboration with domain experts to capture real-world enterprise challenges. UNDERWRITE introduces critical realism factors often absent in current benchmarks: proprietary business knowledge, noisy tool interfaces, and imperfect simulated users requiring careful information gathering. Evaluating 13 frontier models, we uncover significant gaps between research lab performance and enterprise readiness: the most accurate models are not the most efficient, models hallucinate domain knowledge despite tool access, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
