MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction

Rui-Yang Ju; KokSheik Wong; Yanlin Jin; Jen-Shiun Chiang

arXiv:2512.14114·cs.CV·December 17, 2025

MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction

Rui-Yang Ju, KokSheik Wong, Yanlin Jin, Jen-Shiun Chiang

PDF

Open Access

TL;DR

MFE-GAN introduces a multi-scale feature extraction framework for document image enhancement and binarization, significantly reducing training and inference times while maintaining high performance in OCR tasks.

Contribution

The paper proposes a novel efficient GAN framework with multi-scale feature extraction, incorporating Haar wavelet transformation and new network components, to improve speed without sacrificing accuracy.

Findings

01

Reduces training and inference times compared to state-of-the-art methods

02

Maintains comparable enhancement and binarization performance

03

Effective in diverse datasets like Benchmark, Nabuco, and CMATERdb

Abstract

Document image enhancement and binarization are commonly performed prior to document analysis and recognition tasks for improving the efficiency and accuracy of optical character recognition (OCR) systems. This is because directly recognizing text in degraded documents, particularly in color images, often results in unsatisfactory recognition performance. To address these issues, existing methods train independent generative adversarial networks (GANs) for different color channels to remove shadows and noise, which, in turn, facilitates efficient text information extraction. However, deploying multiple GANs results in long training and inference times. To reduce both training and inference times of document image enhancement and binarization models, we propose MFE-GAN, an efficient GAN-based framework with multi-scale feature extraction (MFE), which incorporates Haar wavelet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques