On Calibration of Scene-Text Recognition Models

Ron Slossberg; Oron Anschel; Amir Markovitz; Ron Litman; Aviad; Aberdam; Shahar Tsiper; Shai Mazor; Jon Wu; R. Manmatha

arXiv:2012.12643·cs.CV·December 24, 2020·1 cites

On Calibration of Scene-Text Recognition Models

Ron Slossberg, Oron Anschel, Amir Markovitz, Ron Litman, Aviad, Aberdam, Shahar Tsiper, Shai Mazor, Jon Wu, R. Manmatha

PDF

Open Access

TL;DR

This paper investigates confidence calibration issues in scene-text recognition models, revealing overconfidence problems, and proposes sequence calibration methods that significantly improve calibration and accuracy.

Contribution

It introduces sequence-based calibration techniques for STR models, reducing calibration error and enhancing recognition accuracy.

Findings

01

Calibration error reduced by up to 7 times.

02

Sequence calibration improves word-level confidence estimates.

03

Applying calibration as preprocessing boosts accuracy.

Abstract

In this work, we study the problem of word-level confidence calibration for scene-text recognition (STR). Although the topic of confidence calibration has been an active research area for the last several decades, the case of structured and sequence prediction calibration has been scarcely explored. We analyze several recent STR methods and show that they are consistently overconfident. We then focus on the calibration of STR models on the word rather than the character level. In particular, we demonstrate that for attention based decoders, calibration of individual character predictions increases word-level calibration error compared to an uncalibrated model. In addition, we apply existing calibration methodologies as well as new sequence-based extensions to numerous STR models, demonstrating reduced calibration error by up to a factor of nearly 7. Finally, we show consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Natural Language Processing Techniques