Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

Yichi Zhang; Nabeel Seedat; Yinpeng Dong; Peng Cui; Jun Zhu; Mihaela van de Schaar

arXiv:2603.02798·cs.AI·March 4, 2026

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

Yichi Zhang, Nabeel Seedat, Yinpeng Dong, Peng Cui, Jun Zhu, Mihaela van de Schaar

PDF

Open Access

TL;DR

GLEAN is a verification framework for high-stakes LLM agents that uses expert-guided evidence accumulation and active verification to improve decision trustworthiness, especially in clinical diagnosis.

Contribution

It introduces a novel guideline-grounded evidence accumulation method with active verification, enhancing calibration and accuracy in high-stakes agent verification.

Findings

01

GLEAN outperforms baselines with 12% higher AUROC in clinical diagnosis.

02

GLEAN reduces Brier score by 50%, indicating better calibration.

03

Clinicians recognize GLEAN's practical utility.

Abstract

As LLM-powered agents have been used for high-stakes decision-making, such as clinical diagnosis, it becomes critical to develop reliable verification of their decisions to facilitate trustworthy deployment. Yet, existing verifiers usually underperform owing to a lack of domain knowledge and limited calibration. To address this, we establish GLEAN, an agent verification framework with Guideline-grounded Evidence Accumulation that compiles expert-curated protocols into trajectory-informed, well-calibrated correctness signals. GLEAN evaluates the step-wise alignment with domain guidelines and aggregates multi-guideline ratings into surrogate features, which are accumulated along the trajectory and calibrated into correctness probabilities using Bayesian logistic regression. Moreover, the estimated uncertainty triggers active verification, which selectively collects additional evidence for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning