Improving Word Recognition using Multiple Hypotheses and Deep Embeddings
Siddhant Bansal, Praveen Krishnan, C.V. Jawahar

TL;DR
This paper introduces EmbedNet and CAB, a novel approach combining multiple hypotheses, deep embeddings, and confidence scoring to significantly improve Hindi word recognition accuracy.
Contribution
The paper presents EmbedNet trained with triplet loss and a Confidence based Accuracy Booster module, enhancing recognition accuracy over existing methods.
Findings
Achieves around 10% absolute improvement in word recognition accuracy.
Effectively utilizes deep embeddings and confidence scores for better prediction.
Systematically evaluated on Hindi book collections.
Abstract
We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a triplet loss for learning a suitable embedding space where the embedding of the word image lies closer to the embedding of the corresponding text transcription. The updated embedding space thus helps in choosing the correct prediction with higher confidence. To further improve the accuracy, we propose a plug-and-play module called Confidence based Accuracy Booster (CAB). The CAB module takes in the confidence scores obtained from the text recognizer and Euclidean distances between the embeddings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsTriplet Loss
