What Is Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution
Xingsong Ye, Yongkun Du, JiaXin Zhang, Chen Li, Jing Lyu, Zhineng Chen

TL;DR
This paper introduces UnionST, a new synthetic data engine with diverse simulations for scene text recognition, and a self-evolution learning framework, significantly improving synthetic data effectiveness and reducing real data dependency.
Contribution
The paper presents UnionST, a novel synthetic data engine with enhanced diversity and realism, and a self-evolution learning method for efficient real data annotation in scene text recognition.
Findings
Models trained on UnionST-S outperform existing synthetic datasets.
UnionST-S can surpass real-data performance in some scenarios.
Using SEL, models achieve competitive results with only 9% of real labels.
Abstract
Large-scale and categorical-balanced text data is essential for training effective Scene Text Recognition (STR) models, which is hard to achieve when collecting real data. Synthetic data offers a cost-effective and perfectly labeled alternative. However, its performance often lags behind, revealing a significant domain gap between real and current synthetic data. In this work, we systematically analyze mainstream rendering-based synthetic datasets and identify their key limitations: insufficient diversity in corpus, font, and layout, which restricts their realism in complex scenarios. To address these issues, we introduce UnionST, a strong data engine synthesizes text covering a union of challenging samples and better aligns with the complexity observed in the wild. We then construct UnionST-S, a large-scale synthetic dataset with improved simulations in challenging scenarios.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Speech Recognition and Synthesis
