Asking questions on handwritten document collections

Minesh Mathew; Lluis Gomez; Dimosthenis Karatzas; CV Jawahar

arXiv:2110.00711·cs.CV·October 5, 2021

Asking questions on handwritten document collections

Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar

PDF

TL;DR

This paper introduces a recognition-free question answering method for handwritten documents that locates answer snippets without text recognition, suitable for historical collections where OCR is unreliable, and demonstrates its effectiveness on new datasets.

Contribution

The paper proposes a novel recognition-free QA approach for handwritten documents using deep embeddings, and introduces two new datasets for evaluation.

Findings

01

Recognition-free approach effectively locates answer snippets in handwritten documents.

02

The method outperforms OCR-based approaches on new datasets.

03

New datasets HW-SQuAD and BenthamQA facilitate future research in handwritten document QA.

Abstract

This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.