VERIRAG: A Post-Retrieval Auditing of Scientific Study Summaries

Shubham Mohole; Hongjun Choi; Shusen Liu; Christine Klymko; Shashank Kushwaha; Derek Shi; Wesam Sakla; Sainyam Galhotra; Ruben Glatt

arXiv:2507.17948·cs.IR·December 8, 2025

VERIRAG: A Post-Retrieval Auditing of Scientific Study Summaries

Shubham Mohole, Hongjun Choi, Shusen Liu, Christine Klymko, Shashank Kushwaha, Derek Shi, Wesam Sakla, Sainyam Galhotra, Ruben Glatt

PDF

Open Access

TL;DR

VERIRAG is a novel auditing framework that detects methodological vulnerabilities in scientific summaries using language models, enhancing the reliability of information amplification by community gatekeepers.

Contribution

It introduces a new post-retrieval auditing approach, a vulnerability taxonomy, and a benchmark dataset to improve detection of methodological flaws in scientific summaries.

Findings

01

VERIRAG improves detection accuracy by at least 19 F1 points.

02

The system generalizes across different language model architectures.

03

Human testers find over 80% of audit trails useful for decision-making.

Abstract

Can democratized information gatekeepers and community note writers effectively decide what scientific information to amplify? Lacking domain expertise, such gatekeepers rely on automated reasoning agents that use RAG to ground evidence to cited sources. But such standard RAG systems validate summaries via semantic grounding and suffer from "methodological blindness," treating all cited evidence as equally valid regardless of rigor. To address this, we introduce VERIRAG, a post-retrieval auditing framework that shifts the task from classification to methodological vulnerability detection. Using private Small Language Models (SLMs), VERIRAG audits source papers against the Veritable taxonomy of statistical rigor. We contribute: (1) a benchmark of 1,730 summaries with realistic, non-obvious perturbations modeled after retracted papers; (2) the auditable Veritable taxonomy; and (3) an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management