Efficient Generative Adversarial Networks for Color Document Image Enhancement and Binarization Using Multi-scale Feature Extraction
Rui-Yang Ju, KokSheik Wong, Jen-Shiun Chiang

TL;DR
This paper introduces an efficient multi-scale feature extraction method using Haar wavelet transformation to improve the speed of GAN-based color document image enhancement and binarization, maintaining high performance.
Contribution
The proposed approach reduces training and inference times of GANs for document image processing by integrating multi-scale features, without sacrificing accuracy.
Findings
Training time reduced by 10%.
Inference time reduced by 26%.
Achieved 73.79 average score, comparable to state-of-the-art.
Abstract
The outcome of text recognition for degraded color documents is often unsatisfactory due to interference from various contaminants. To extract information more efficiently for text recognition, document image enhancement and binarization are often employed as preliminary steps in document analysis. Training independent generative adversarial networks (GANs) for each color channel can generate images where shadows and noise are effectively removed, which subsequently allows for efficient text information extraction. However, employing multiple GANs for different color channels requires long training and inference times. To reduce both the training and inference times of these preliminary steps, we propose an efficient method based on multi-scale feature extraction, which incorporates Haar wavelet transformation and normalization to process document images before submitting them to GANs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction
