RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing

Pritesh Jha

arXiv:2604.23644·cs.CV·April 28, 2026

RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing

Pritesh Jha

PDF

1 Repo

TL;DR

RaV-IDP introduces a validation framework for document processing that reconstructs and compares extracted entities to source documents, ensuring fidelity and improving reliability.

Contribution

It presents a novel reconstruction-based validation architecture with a GPT-4.1 fallback, enhancing trustworthiness in document extraction pipelines.

Findings

01

Reconstruction-based fidelity scores effectively verify extraction accuracy.

02

The framework triggers GPT-4.1 fallback when fidelity is low.

03

Public code implementation is available for experimentation.

Abstract

Intelligent document processing pipelines extract structured entities (tables, images, and text) from documents for use in downstream systems such as knowledge bases, retrieval-augmented generation, and analytics. A persistent limitation of existing pipelines is that extraction output is produced without any intrinsic mechanism to verify whether it faithfully represents the source. Model-internal confidence scores measure inference certainty, not correspondence to the document, and extraction errors pass silently into downstream consumers. We present Reconstruction as Validation (RaV-IDP), a document processing pipeline that introduces reconstruction as a first-class architectural component. After each entity is extracted, a dedicated reconstructor renders the extracted representation back into a form comparable to the original document region, and a comparator scores fidelity between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pritesh-2711/RaV-IDP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.