Unsupervised Neural Domain Adaptation for Document Image Binarization
Francisco J. Castellanos, Antonio-Javier Gallego, Jorge Calvo-Zaragoza

TL;DR
This paper introduces an unsupervised neural domain adaptation method for document image binarization, effectively handling diverse document types without requiring labeled data by measuring domain similarity beforehand.
Contribution
It proposes a novel approach combining neural networks and domain adaptation with an innovative domain similarity measure to improve unsupervised document binarization.
Findings
Successfully binarizes new document domains without labeled data
Effectively handles multiple domain combinations in experiments
Domain similarity measurement guides adaptation decisions
Abstract
Binarization is a well-known image processing task, whose objective is to separate the foreground of an image from the background. One of the many tasks for which it is useful is that of preprocessing document images in order to identify relevant information, such as text or symbols. The wide variety of document types, alphabets, and formats makes binarization challenging. There are multiple proposals with which to solve this problem, from classical manually-adjusted methods, to more recent approaches based on machine learning. The latter techniques require a large amount of training data in order to obtain good results; however, labeling a portion of each existing collection of documents is not feasible in practice. This is a common problem in supervised learning, which can be addressed by using the so-called Domain Adaptation (DA) techniques. These techniques take advantage of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
