VERITAS: Verifiable Epistemic Reasoning for Image-Derived Hypothesis Testing via Agentic Systems
Lucas Stoffl, Benedikt Wiestler, Johannes C. Paetzold

TL;DR
VERITAS is a multi-agent system that automates hypothesis testing on multimodal clinical data, providing an auditable trail and classifying outcomes with an epistemic evidence framework.
Contribution
It introduces a multi-agent workflow with an epistemic evidence label framework for verifiable, interpretable hypothesis testing in medical imaging.
Findings
VERITAS achieves 81.4% verdict accuracy with frontier models.
It produces 86.6% verifiable statistical outputs.
Outperforms five single-model baselines in accuracy and verifiability.
Abstract
Drawing meaningful conclusions from inherently multimodal clinical data (including medical imaging) requires coordinating expertise across the clinical specialty, radiology, programming, and biostatistics. This fragmented process bottlenecks discovery. We present VERITAS (Verifiable Epistemic Reasoning for Image-Derived Hypothesis Testing via Agentic Systems), a multi-agent system that autonomously tests natural-language hypotheses on multimodal clinical datasets while producing a fully auditable evidence trail: every statistical conclusion traces through inspectable, executable outputs from analysis plan to segmentation masks to statistical code to final verdict. VERITAS decomposes the workflow into four phases handled by role-specialized agents, and introduces an epistemic evidence label framework that mechanically classifies outcomes as Supported, Refuted, Underpowered, or Invalid by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
