Geometric Representation Learning for Document Image Rectification
Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

TL;DR
This paper introduces DocGeoNet, a novel geometric representation learning framework that leverages 3D shape and textline attributes to improve document image rectification, outperforming existing methods on benchmark datasets.
Contribution
The paper proposes a new geometric representation learning approach incorporating 3D shape and textlines for enhanced document rectification.
Findings
Outperforms state-of-the-art methods on DocUNet Benchmark.
Effectively utilizes geometric constraints for better rectification.
Demonstrates robustness on a new DIR300 dataset.
Abstract
In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification performance. To this end, we present DocGeoNet for document image rectification by introducing explicit geometric representation. Technically, two typical attributes of the document image are involved in the proposed geometric representation learning, i.e., 3D shape and textlines. Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image while overlooking the local structure. On the other hand, textlines complementarily provide explicit geometric constraints for local patterns. The learned geometric representation effectively bridges the distorted image and the ground truth one.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
MethodsTest
