Neural Word Search in Historical Manuscript Collections
Tomas Wilkinson, Jonas Lindstr\"om, Anders Brun

TL;DR
This paper introduces neural network models for efficient word spotting in historical manuscripts, enabling large-scale digital humanities research with minimal training data and surpassing previous methods in accuracy.
Contribution
The authors propose the Ctrl-F-Net and Ctrl-F-Mini models for end-to-end word spotting, demonstrating improved performance and speed on benchmark datasets and real-world collections.
Findings
Ctrl-F-Net surpasses previous state-of-the-art in word spotting accuracy.
Ctrl-F-Mini offers faster performance with comparable results.
Application to 100,000+ pages accelerates historical manuscript research.
Abstract
We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first propose an end-to-end trainable model based on deep neural networks that we dub Ctrl-F-Net. The model simultaneously generates region proposals and embeds them into a word embedding space, wherein a search is performed. We further introduce a simplified version called Ctrl-F-Mini. It is faster with similar performance, though it is limited to more easily segmented manuscripts. We evaluate both models on common benchmark datasets and surpass the previous state of the art. Finally, in collaboration with historians, we employ the Ctrl-F-Net to search within a large manuscript collection of over 100 thousand pages, written across two centuries. With only 11 training pages, we enable large scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
