On Calibration of Scene-Text Recognition Models
Ron Slossberg, Oron Anschel, Amir Markovitz, Ron Litman, Aviad, Aberdam, Shahar Tsiper, Shai Mazor, Jon Wu, R. Manmatha

TL;DR
This paper investigates confidence calibration issues in scene-text recognition models, revealing overconfidence problems, and proposes sequence calibration methods that significantly improve calibration and accuracy.
Contribution
It introduces sequence-based calibration techniques for STR models, reducing calibration error and enhancing recognition accuracy.
Findings
Calibration error reduced by up to 7 times.
Sequence calibration improves word-level confidence estimates.
Applying calibration as preprocessing boosts accuracy.
Abstract
In this work, we study the problem of word-level confidence calibration for scene-text recognition (STR). Although the topic of confidence calibration has been an active research area for the last several decades, the case of structured and sequence prediction calibration has been scarcely explored. We analyze several recent STR methods and show that they are consistently overconfident. We then focus on the calibration of STR models on the word rather than the character level. In particular, we demonstrate that for attention based decoders, calibration of individual character predictions increases word-level calibration error compared to an uncalibrated model. In addition, we apply existing calibration methodologies as well as new sequence-based extensions to numerous STR models, demonstrating reduced calibration error by up to a factor of nearly 7. Finally, we show consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Natural Language Processing Techniques
