Cascaded Robust Rectification for Arbitrary Document Images

Chaoyun Wang; Quanxin Huang; I-Chao Shen; Takeo Igarashi; Nanning Zheng; Caigui Jiang

arXiv:2511.23150·cs.CV·December 1, 2025

Cascaded Robust Rectification for Arbitrary Document Images

Chaoyun Wang, Quanxin Huang, I-Chao Shen, Takeo Igarashi, Nanning Zheng, Caigui Jiang

PDF

Open Access

TL;DR

This paper introduces a multi-stage framework for robust document image rectification that progressively corrects various distortions, achieving state-of-the-art results and proposing new evaluation metrics for geometric correction quality.

Contribution

The paper presents a novel coarse-to-fine multi-stage approach for document rectification, addressing perspective, physical deformations, and fine content distortions, with new evaluation metrics.

Findings

01

Achieves 14.1 ext%-34.7 ext% reduction in AAD metric.

02

Establishes new state-of-the-art performance on multiple benchmarks.

03

Proposes layout-aligned OCR metrics and masked distortion metrics for better evaluation.

Abstract

Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques