Unsupervised Neural Domain Adaptation for Document Image Binarization

Francisco J. Castellanos; Antonio-Javier Gallego; Jorge Calvo-Zaragoza

arXiv:2012.01204·cs.CV·July 2, 2021

Unsupervised Neural Domain Adaptation for Document Image Binarization

Francisco J. Castellanos, Antonio-Javier Gallego, Jorge Calvo-Zaragoza

PDF

TL;DR

This paper introduces an unsupervised neural domain adaptation method for document image binarization, effectively handling diverse document types without requiring labeled data by measuring domain similarity beforehand.

Contribution

It proposes a novel approach combining neural networks and domain adaptation with an innovative domain similarity measure to improve unsupervised document binarization.

Findings

01

Successfully binarizes new document domains without labeled data

02

Effectively handles multiple domain combinations in experiments

03

Domain similarity measurement guides adaptation decisions

Abstract

Binarization is a well-known image processing task, whose objective is to separate the foreground of an image from the background. One of the many tasks for which it is useful is that of preprocessing document images in order to identify relevant information, such as text or symbols. The wide variety of document types, alphabets, and formats makes binarization challenging. There are multiple proposals with which to solve this problem, from classical manually-adjusted methods, to more recent approaches based on machine learning. The latter techniques require a large amount of training data in order to obtain good results; however, labeling a portion of each existing collection of documents is not feasible in practice. This is a common problem in supervised learning, which can be addressed by using the so-called Domain Adaptation (DA) techniques. These techniques take advantage of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.