Lifelong learning for text retrieval and recognition in historical   handwritten document collections

Lambert Schomaker

arXiv:1912.05156·cs.CV·December 12, 2019

Lifelong learning for text retrieval and recognition in historical handwritten document collections

Lambert Schomaker

PDF

Open Access

TL;DR

This paper discusses the development of a lifelong learning system for text retrieval and recognition in large, diverse collections of historical handwritten documents, emphasizing scalability and evolving ground truth.

Contribution

It introduces the 'ball-park principle' to guide the transition from traditional to deep learning methods based on data labeling levels.

Findings

01

Deep learning offers high potential but requires scalable data labeling.

02

The 'ball-park principle' helps manage the evolution of learning approaches.

03

The system addresses variability across scripts and languages in historical documents.

Abstract

This chapter provides an overview of the problems that need to be dealt with when constructing a lifelong-learning retrieval, recognition and indexing engine for large historical document collections in multiple scripts and languages, the Monk system. This application is highly variable over time, since the continuous labeling by end users changes the concept of what a 'ground truth' constitutes. Although current advances in deep learning provide a huge potential in this application domain, the scale of the problem, i.e., more than 520 hugely diverse books, documents and manuscripts precludes the current meticulous and painstaking human effort which is required in designing and developing successful deep-learning systems. The ball-park principle is introduced, which describes the evolution from the sparsely-labeled stage that can only be addressed by traditional methods or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques