Efficient Generative Adversarial Networks for Color Document Image Enhancement and Binarization Using Multi-scale Feature Extraction

Rui-Yang Ju; KokSheik Wong; Jen-Shiun Chiang

arXiv:2407.04231·cs.CV·December 2, 2025

Efficient Generative Adversarial Networks for Color Document Image Enhancement and Binarization Using Multi-scale Feature Extraction

Rui-Yang Ju, KokSheik Wong, Jen-Shiun Chiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient multi-scale feature extraction method using Haar wavelet transformation to improve the speed of GAN-based color document image enhancement and binarization, maintaining high performance.

Contribution

The proposed approach reduces training and inference times of GANs for document image processing by integrating multi-scale features, without sacrificing accuracy.

Findings

01

Training time reduced by 10%.

02

Inference time reduced by 26%.

03

Achieved 73.79 average score, comparable to state-of-the-art.

Abstract

The outcome of text recognition for degraded color documents is often unsatisfactory due to interference from various contaminants. To extract information more efficiently for text recognition, document image enhancement and binarization are often employed as preliminary steps in document analysis. Training independent generative adversarial networks (GANs) for each color channel can generate images where shadows and noise are effectively removed, which subsequently allows for efficient text information extraction. However, employing multiple GANs for different color channels requires long training and inference times. To reduce both the training and inference times of these preliminary steps, we propose an efficient method based on multi-scale feature extraction, which incorporates Haar wavelet transformation and normalization to process document images before submitting them to GANs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruiyangju/efficient_document_image_binarization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction