DUET: Detection Utilizing Enhancement for Text in Scanned or Captured Documents
Eun-Soo Jung, HyeongGwan Son, Kyusam Oh, Yongkeun Yun, Soonhwan Kwon,, Min Soo Kim

TL;DR
This paper introduces DUET, a deep neural model that combines text detection with enhancement tasks to improve accuracy in noisy scanned document images, utilizing synthetic data and a two-phase training process.
Contribution
The paper proposes a novel multi-task learning approach with auxiliary enhancement tasks and a two-phase training scheme using synthetic and real data for improved document text detection.
Findings
Outperforms existing text detection methods on real document datasets.
Synthetic data and auxiliary enhancement tasks significantly improve detection accuracy.
Two-phase training effectively leverages synthetic and real data for robust performance.
Abstract
We present a novel deep neural model for text detection in document images. For robust text detection in noisy scanned documents, the advantages of multi-task learning are adopted by adding an auxiliary task of text enhancement. Namely, our proposed model is designed to perform noise reduction and text region enhancement as well as text detection. Moreover, we enrich the training data for the model with synthesized document images that are fully labeled for text detection and enhancement, thus overcome the insufficiency of labeled document image data. For the effective exploitation of the synthetic and real data, the training process is separated in two phases. The first phase is training only synthetic data in a fully-supervised manner. Then real data with only detection labels are added in the second phase. The enhancement task for the real data is weakly-supervised with information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
