TextMatcher: Cross-Attentional Neural Network to Compare Image and Text

Valentina Arrigoni; Luisa Repele; Dario Marino Saccavino

arXiv:2205.05507·cs.CV·October 7, 2022

TextMatcher: Cross-Attentional Neural Network to Compare Image and Text

Valentina Arrigoni, Luisa Repele, Dario Marino Saccavino

PDF

Open Access

TL;DR

TextMatcher is a novel neural network model that uses cross-attention to accurately compare images containing text with candidate transcriptions, improving performance and speed in multimodal text matching tasks.

Contribution

The paper introduces the first machine-learning model specifically designed for image-text text matching, utilizing cross-attention mechanisms for enhanced comparison accuracy.

Findings

01

Outperforms existing models on the IAM dataset.

02

Achieves higher accuracy across various configurations.

03

Runs faster during inference.

Abstract

We study a novel multimodal-learning problem, which we call text matching: given an image containing a single-line text and a candidate text transcription, the goal is to assess whether the text represented in the image corresponds to the candidate text. We devise the first machine-learning model specifically designed for this problem. The proposed model, termed TextMatcher, compares the two inputs by applying a cross-attention mechanism over the embedding representations of image and text, and it is trained in an end-to-end fashion. We extensively evaluate the empirical performance of TextMatcher on the popular IAM dataset. Results attest that, compared to a baseline and existing models designed for related problems, TextMatcher achieves higher performance on a variety of configurations, while at the same time running faster at inference time. We also showcase TextMatcher in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Topic Modeling · Natural Language Processing Techniques