Evaluation of the Effect of Improper Segmentation on Word Spotting
Sounak Dey, Anguelos Nicolaou, Josep Llados, and Umapada Pal

TL;DR
This paper introduces an experimental framework to quantify how imperfect word segmentation impacts the performance of word spotting methods in historical document analysis, highlighting the importance of segmentation quality.
Contribution
It proposes a systematic approach to evaluate the effect of segmentation errors on word spotting accuracy using distorted datasets and applies it to multiple datasets and methods.
Findings
Segmentation quality significantly affects word spotting performance.
The framework provides end-to-end performance estimates under realistic conditions.
State-of-the-art methods' robustness varies with segmentation distortions.
Abstract
Word spotting is an important recognition task in historical document analysis. In most cases methods are developed and evaluated assuming perfect word segmentations. In this paper we propose an experimental framework to quantify the effect of goodness of word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generating systematic distortions on segmentation and retrieving the original queries from the distorted dataset. We apply the framework on the George Washington and Barcelona Marriage Dataset and on several established and state-of-the-art methods. The experiments allow for an estimate of the end-to-end performance of word spotting methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Topic Modeling
