Cascaded Robust Rectification for Arbitrary Document Images
Chaoyun Wang, Quanxin Huang, I-Chao Shen, Takeo Igarashi, Nanning Zheng, Caigui Jiang

TL;DR
This paper introduces a multi-stage framework for robust document image rectification that progressively corrects various distortions, achieving state-of-the-art results and proposing new evaluation metrics for geometric correction quality.
Contribution
The paper presents a novel coarse-to-fine multi-stage approach for document rectification, addressing perspective, physical deformations, and fine content distortions, with new evaluation metrics.
Findings
Achieves 14.1 ext%-34.7 ext% reduction in AAD metric.
Establishes new state-of-the-art performance on multiple benchmarks.
Proposes layout-aligned OCR metrics and masked distortion metrics for better evaluation.
Abstract
Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
