TL;DR
This paper introduces a synthetic-data-trained scene text erasing method with a stroke mask prediction and background inpainting, achieving superior results on multiple datasets without relying on large real-world datasets.
Contribution
The authors propose a novel synthetic data generation approach and a specialized network architecture for scene text erasing, outperforming existing methods trained on real data.
Findings
Outperforms state-of-the-art methods on multiple datasets
Effective training with synthetic data alone
Partial text erasing with bounding box or detector input
Abstract
Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn significant attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Both subtasks require considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset does not allow existing methods to realize their potential. To compensate for the lack of pairwise real-world data, we made considerable use of synthetic text after additional enhancement and subsequently trained our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the cropped text image to maintain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInpainting
