Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions
Yibo Wang, Yunhu Ye, Yuanpeng Mao, Yanwei Yu, Yuanping Song

TL;DR
This paper introduces a self-supervised scene text segmentation method that leverages object-centric layered representations and text region information, eliminating the need for pixel-level labels or synthetic pretraining.
Contribution
It proposes a novel layered decoupling approach with Region Query Module and Representation Consistency Constraints, enhancing text sensitivity without pixel-level annotations.
Findings
Outperforms state-of-the-art unsupervised segmentation methods on public datasets.
Does not require pixel-level masks or synthetic dataset pretraining.
Effectively segments text in real scenes using only text localization inputs.
Abstract
Text segmentation tasks have a very wide range of application values, such as image editing, style transfer, watermark removal, etc.However, existing public datasets are of poor quality of pixel-level labels that have been shown to be notoriously costly to acquire, both in terms of money and time. At the same time, when pretraining is performed on synthetic datasets, the data distribution of the synthetic datasets is far from the data distribution in the real scene. These all pose a huge challenge to the current pixel-level text segmentation algorithms.To alleviate the above problems, we propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background. In our method, we propose two novel designs which include Region Query Module and Representation Consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
