Self-Supervised Text Erasing with Controllable Image Synthesis
Gangwei Jiang, Shiyao Wang, Tiezheng Ge, Yuning Jiang, Ying Wei, Defu, Lian

TL;DR
This paper introduces an unsupervised framework for scene text erasing that synthesizes training data with style control, enabling effective text removal without costly annotations, and demonstrates superior results on a new challenging dataset.
Contribution
The paper proposes a novel self-supervised text erasing method with style-aware synthetic image generation and a triplet loss, reducing reliance on labeled data and improving erasing quality.
Findings
Achieves 5.07 FID on PosterErase, outperforming supervised methods.
Introduces a new challenging dataset with 60K high-resolution posters.
Demonstrates effectiveness of style control in synthetic data generation.
Abstract
Recent efforts on scene text erasing have shown promising results. However, existing methods require rich yet costly label annotations to obtain robust models, which limits the use for practical applications. To this end, we study an unsupervised scenario by proposing a novel Self-supervised Text Erasing (STE) framework that jointly learns to synthesize training images with erasure ground-truth and accurately erase texts in the real world. We first design a style-aware image synthesis function to generate synthetic images with diverse styled texts based on two synthetic mechanisms. To bridge the text style gap between the synthetic and real-world data, a policy network is constructed to control the synthetic mechanisms by picking style parameters with the guidance of two specifically designed rewards. The synthetic training images with erasure ground-truth are then fed to train a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
