DeepErase: Weakly Supervised Ink Artifact Removal in Document Text Images
W. Ronny Huang, Yike Qi, Qianqian Li, Jonathan Degange

TL;DR
DeepErase is a neural network-based preprocessor that effectively removes ink artifacts from scanned document images, significantly improving OCR accuracy for both printed and handwritten texts, especially on out-of-distribution datasets.
Contribution
The paper introduces DeepErase, a weakly supervised neural approach for ink artifact removal in document images, with a novel data assembly method for training without pixel-level annotations.
Findings
Achieves high segmentation accuracy of ink artifacts.
Boosts OCR recognition accuracy substantially.
Performs well on out-of-distribution datasets like IRS tax forms.
Abstract
Paper-intensive industries like insurance, law, and government have long leveraged optical character recognition (OCR) to automatically transcribe hordes of scanned documents into text strings for downstream processing. Even in 2019, there are still many scanned documents and mail that come into businesses in non-digital format. Text to be extracted from real world documents is often nestled inside rich formatting, such as tabular structures or forms with fill-in-the-blank boxes or underlines whose ink often touches or even strikes through the ink of the text itself. Further, the text region could have random ink smudges or spurious strokes. Such ink artifacts can severely interfere with the performance of recognition algorithms or other downstream processing tasks. In this work, we propose DeepErase, a neural-based preprocessor to erase ink artifacts from text images. We devise a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Media Forensic Detection · Image Processing and 3D Reconstruction
MethodsTest
