Asking questions on handwritten document collections
Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar

TL;DR
This paper introduces a recognition-free question answering method for handwritten documents that locates answer snippets without text recognition, suitable for historical collections where OCR is unreliable, and demonstrates its effectiveness on new datasets.
Contribution
The paper proposes a novel recognition-free QA approach for handwritten documents using deep embeddings, and introduces two new datasets for evaluation.
Findings
Recognition-free approach effectively locates answer snippets in handwritten documents.
The method outperforms OCR-based approaches on new datasets.
New datasets HW-SQuAD and BenthamQA facilitate future research in handwritten document QA.
Abstract
This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
