Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

Xunzhuo Liu; Bowei He; Xue Liu; Haichen Zhang; Huamin Chen

arXiv:2603.23508·cs.CL·March 26, 2026

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

Xunzhuo Liu, Bowei He, Xue Liu, Haichen Zhang, Huamin Chen

PDF

Open Access

TL;DR

This paper introduces a real-time verification system for long-document retrieval-augmented generation that ensures responses are grounded in source materials within strict latency constraints, improving reliability in practical applications.

Contribution

It presents a novel architecture for full-document verification in RAG systems, enabling efficient, real-time validation of long contexts that surpasses truncated methods.

Findings

01

Full-context verification improves detection of unsupported responses.

02

Chunk-based checking often fails on real documents.

03

Latency constraints influence model design and verification strategies.

Abstract

Retrieval-augmented generation (RAG) is increasingly deployed in enterprise search and document-centric assistants, where responses must be grounded in long and complex source materials. In practice, verifying that generated answers faithfully reflect retrieved documents is difficult: large language models can check long contexts but are too slow and costly for interactive services, while lightweight classifiers operate within strict context limits and frequently miss evidence outside truncated passages. We present the design of a real-time verification component integrated into a production RAG pipeline that enables full-document grounding under latency constraints. The system processes documents up to 32K tokens and employs adaptive inference strategies to balance response time and verification coverage across workloads. We describe the architectural decisions, operational trade-offs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques