Improving Scene Text Recognition for Character-Level Long-Tailed Distribution
Sunghyun Park, Sunghyo Chung, Jungsoo Lee, Jaegul Choo

TL;DR
This paper introduces CAFE-Net, a novel approach for scene text recognition that employs two specialized experts to handle long-tailed character distributions, improving recognition accuracy for languages with many characters.
Contribution
The paper proposes a dual-expert framework with a confidence ensemble method to enhance scene text recognition in long-tailed character distributions, addressing limitations of existing models.
Findings
CAFE-Net improves recognition accuracy on long-tailed character datasets.
The dual-expert approach effectively balances contextual and visual information.
The method is adaptable to various scene text recognition models.
Abstract
Despite the recent remarkable improvements in scene text recognition (STR), the majority of the studies focused mainly on the English language, which only includes few number of characters. However, STR models show a large performance degradation on languages with a numerous number of characters (e.g., Chinese and Korean), especially on characters that rarely appear due to the long-tailed distribution of characters in such languages. To address such an issue, we conducted an empirical analysis using synthetic datasets with different character-level distributions (e.g., balanced and long-tailed distributions). While increasing a substantial number of tail classes without considering the context helps the model to correctly recognize characters individually, training with such a synthetic dataset interferes the model with learning the contextual information (i.e., relation among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Image Retrieval and Classification Techniques
