Leveraging Text Localization for Scene Text Removal via Text-aware   Masked Image Modeling

Zixiao Wang; Hongtao Xie; YuXin Wang; Yadong Qu; Fengjun Guo and; Pengwei Liu

arXiv:2409.13431·cs.CV·September 23, 2024

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling

Zixiao Wang, Hongtao Xie, YuXin Wang, Yadong Qu, Fengjun Guo and, Pengwei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces TMIM, a novel weakly supervised pretraining method for scene text removal that leverages text localization data to improve performance and reduce reliance on costly pixel-level annotations.

Contribution

The paper proposes a new Text-aware Masked Image Modeling approach that enables direct, weakly supervised training of scene text removal models using only text detection labels.

Findings

01

Achieves state-of-the-art PSNR of 37.35 on SCUT-EnsText

02

Outperforms previous pretraining methods in scene text removal

03

Reduces dependence on expensive pixel-level annotations

Abstract

Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling. In this paper, we aim to address this issue by introducing a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels (e.g., text bounding box). Different from previous pretraining methods that use indirect auxiliary tasks only to enhance the implicit feature extraction ability, our TMIM first enables the STR task to be directly trained in a weakly supervised manner, which explores the STR knowledge explicitly and efficiently. In TMIM, first, a Background Modeling stream is built to learn background generation rules by recovering the masked non-text region. Meanwhile, it provides pseudo STR labels on the masked text region. Second, a Text Erasing stream is proposed to learn from the pseudo labels and equip…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wzx99/tmim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques