TL;DR
MTRNet++ is a one-stage, mask-based scene text eraser that achieves state-of-the-art results with a novel multi-branch architecture, attention mechanisms, and controllability, without relying on external masks.
Contribution
The paper introduces MTRNet++, a new one-stage text removal network with a multi-branch architecture and attention blocks, improving controllability and interpretability over prior methods.
Findings
Achieves state-of-the-art results on Oxford and SCUT datasets.
Effective multi-branch architecture with attention blocks demonstrated.
Operates with or without external masks, showing flexibility.
Abstract
A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
