MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction
Rui-Yang Ju, KokSheik Wong, Yanlin Jin, Jen-Shiun Chiang

TL;DR
MFE-GAN introduces a multi-scale feature extraction framework for document image enhancement and binarization, significantly reducing training and inference times while maintaining high performance in OCR tasks.
Contribution
The paper proposes a novel efficient GAN framework with multi-scale feature extraction, incorporating Haar wavelet transformation and new network components, to improve speed without sacrificing accuracy.
Findings
Reduces training and inference times compared to state-of-the-art methods
Maintains comparable enhancement and binarization performance
Effective in diverse datasets like Benchmark, Nabuco, and CMATERdb
Abstract
Document image enhancement and binarization are commonly performed prior to document analysis and recognition tasks for improving the efficiency and accuracy of optical character recognition (OCR) systems. This is because directly recognizing text in degraded documents, particularly in color images, often results in unsatisfactory recognition performance. To address these issues, existing methods train independent generative adversarial networks (GANs) for different color channels to remove shadows and noise, which, in turn, facilitates efficient text information extraction. However, deploying multiple GANs results in long training and inference times. To reduce both training and inference times of document image enhancement and binarization models, we propose MFE-GAN, an efficient GAN-based framework with multi-scale feature extraction (MFE), which incorporates Haar wavelet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
