Exploring Stroke-Level Modifications for Scene Text Editing
Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, Yuxin Wang, Yongdong, Zhang

TL;DR
This paper introduces MOSTEL, a novel scene text editing network that uses stroke guidance maps and semi-supervised hybrid learning to improve editing quality on real-world images, outperforming previous methods.
Contribution
The paper proposes a stroke-level modification approach with explicit guidance and semi-supervised training, addressing background interference and domain gap issues in scene text editing.
Findings
MOSTEL outperforms previous methods in quality and accuracy.
New datasets Tamper-Syn2k and Tamper-Scene are introduced.
Semi-supervised training improves real-world editing performance.
Abstract
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques · Video Analysis and Summarization
