Automatic Document Image Binarization using Bayesian Optimization
Ekta Vats, Anders Hast, Prashant Singh

TL;DR
This paper introduces an automatic document image binarization method that combines a two band-pass filtering approach with Bayesian optimization to adaptively select optimal parameters, improving segmentation quality on degraded documents.
Contribution
It proposes a novel combination of filtering and Bayesian optimization for automatic hyperparameter tuning in document binarization, enhancing robustness and accuracy.
Findings
Effective on DIBCO and H-DIBCO datasets
Outperforms some existing binarization methods
Automatically adapts to various degradation levels
Abstract
Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The proposed technique uses a two band-pass filtering approach for background noise removal, and Bayesian optimization for automatic hyperparameter selection for optimal results. The effectiveness of the proposed binarization technique is empirically demonstrated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
