Edit Probability for Scene Text Recognition
Fan Bai, Zhanzhan Cheng, Yi Niu, Shiliang Pu, Shuigeng Zhou

TL;DR
This paper introduces edit probability (EP), a novel method for scene text recognition that addresses misalignment issues in attention-based models, leading to improved accuracy on standard benchmarks.
Contribution
The paper proposes the edit probability (EP) approach, which effectively handles character misalignments in attention-based scene text recognition models, enhancing training and recognition performance.
Findings
EP significantly improves recognition accuracy on benchmarks.
The method effectively mitigates misalignment issues during training.
Experimental results outperform existing state-of-the-art methods.
Abstract
We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or superfluous characters, will confuse and mislead the training process, and consequently make the training costly and degrade the recognition accuracy. To handle this problem, we propose a novel method called edit probability (EP) for scene text recognition. EP tries to effectively estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
