Geometric Representation Learning for Document Image Rectification

Hao Feng; Wengang Zhou; Jiajun Deng; Yuechen Wang; Houqiang Li

arXiv:2210.08161·cs.CV·October 18, 2022·1 cites

Geometric Representation Learning for Document Image Rectification

Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces DocGeoNet, a novel geometric representation learning framework that leverages 3D shape and textline attributes to improve document image rectification, outperforming existing methods on benchmark datasets.

Contribution

The paper proposes a new geometric representation learning approach incorporating 3D shape and textlines for enhanced document rectification.

Findings

01

Outperforms state-of-the-art methods on DocUNet Benchmark.

02

Effectively utilizes geometric constraints for better rectification.

03

Demonstrates robustness on a new DIR300 dataset.

Abstract

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the rectification performance. To this end, we present DocGeoNet for document image rectification by introducing explicit geometric representation. Technically, two typical attributes of the document image are involved in the proposed geometric representation learning, i.e., 3D shape and textlines. Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image while overlooking the local structure. On the other hand, textlines complementarily provide explicit geometric constraints for local patterns. The learned geometric representation effectively bridges the distorted image and the ground truth one.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction

MethodsTest