Scene Text Synthesis for Efficient and Effective Deep Network Training
Changgong Zhang, Fangneng Zhan, Hongyuan Zhu, Shijian Lu

TL;DR
This paper introduces a novel scene text image synthesis method that embeds foreground objects into backgrounds with semantic coherence and appearance harmony, improving deep network training for text detection and recognition.
Contribution
It presents an innovative image synthesis technique that enhances training data quality by realistic embedding of objects, boosting scene text detection and recognition performance.
Findings
Synthesized images achieve comparable or better performance than real images in training.
The method improves deep network robustness and accuracy in scene text tasks.
Effective across diverse datasets and challenging scenarios.
Abstract
A large amount of annotated training images is critical for training accurate and robust deep network models but the collection of a large amount of annotated training images is often time-consuming and costly. Image synthesis alleviates this constraint by generating annotated training images automatically by machines which has attracted increasing interest in the recent deep learning research. We develop an innovative image synthesis technique that composes annotated training images by realistically embedding foreground objects of interest (OOI) into background images. The proposed technique consists of two key components that in principle boost the usefulness of the synthesized images in deep network training. The first is context-aware semantic coherence which ensures that the OOI are placed around semantically coherent regions within the background image. The second is harmonious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques
