VERIRAG: A Post-Retrieval Auditing of Scientific Study Summaries
Shubham Mohole, Hongjun Choi, Shusen Liu, Christine Klymko, Shashank Kushwaha, Derek Shi, Wesam Sakla, Sainyam Galhotra, Ruben Glatt

TL;DR
VERIRAG is a novel auditing framework that detects methodological vulnerabilities in scientific summaries using language models, enhancing the reliability of information amplification by community gatekeepers.
Contribution
It introduces a new post-retrieval auditing approach, a vulnerability taxonomy, and a benchmark dataset to improve detection of methodological flaws in scientific summaries.
Findings
VERIRAG improves detection accuracy by at least 19 F1 points.
The system generalizes across different language model architectures.
Human testers find over 80% of audit trails useful for decision-making.
Abstract
Can democratized information gatekeepers and community note writers effectively decide what scientific information to amplify? Lacking domain expertise, such gatekeepers rely on automated reasoning agents that use RAG to ground evidence to cited sources. But such standard RAG systems validate summaries via semantic grounding and suffer from "methodological blindness," treating all cited evidence as equally valid regardless of rigor. To address this, we introduce VERIRAG, a post-retrieval auditing framework that shifts the task from classification to methodological vulnerability detection. Using private Small Language Models (SLMs), VERIRAG audits source papers against the Veritable taxonomy of statistical rigor. We contribute: (1) a benchmark of 1,730 summaries with realistic, non-obvious perturbations modeled after retracted papers; (2) the auditable Veritable taxonomy; and (3) an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
