Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims
Ruiying Chen

TL;DR
EviBound is a governance framework that uses dual evidence-based gates to eliminate false claims in autonomous research, ensuring verified results with minimal overhead.
Contribution
It introduces a dual governance gate system that enforces machine-checkable evidence before and after execution to prevent false claims in autonomous research agents.
Findings
EviBound achieves 0% hallucination in benchmark tasks.
Verification-only reduces hallucination to 25%.
Baseline prompt-level approach yields 100% hallucination.
Abstract
LLM-based autonomous research agents report false claims: tasks marked "complete" despite missing artifacts, contradictory metrics, or failed executions. EviBound is an evidence-bound execution framework that eliminates false claims through dual governance gates requiring machine-checkable evidence. Two complementary gates enforce evidence requirements. The pre-execution Approval Gate validates acceptance criteria schemas before code runs, catching structural violations proactively. The post-execution Verification Gate validates artifacts via MLflow API queries (with recursive path checking) and optionally validates metrics when specified by acceptance criteria. Claims propagate only when backed by a queryable run ID, required artifacts, and FINISHED status. Bounded, confidence-gated retries (typically 1-2 attempts) recover from transient failures without unbounded loops. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Biomedical Text Mining and Ontologies
