CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction
Liang Zhao, Qing Guo, Xiaoguang Li, and Song Wang

TL;DR
This paper introduces CLII, a cross-modal inpainting model that leverages visual and textual information to restore damaged scene text images and complete missing text, outperforming existing methods.
Contribution
The paper proposes a novel cross-modal predictive interaction model for scene text inpainting and text completion, integrating visual and textual cues for improved restoration.
Findings
Outperforms baseline methods significantly in experiments.
Effectively restores damaged scene text images across various scenarios.
Enhances robustness of scene text spotting with missing pixels.
Abstract
Image inpainting aims to fill missing pixels in damaged images and has achieved significant progress with cut-edging learning techniques. Nevertheless, state-of-the-art inpainting methods are mainly designed for nature images and cannot correctly recover text within scene text images, and training existing models on the scene text images cannot fix the issues. In this work, we identify the visual-text inpainting task to achieve high-quality scene text image restoration and text completion: Given a scene text image with unknown missing regions and the corresponding text with unknown missing characters, we aim to complete the missing information in both images and text by leveraging their complementary information. Intuitively, the input text, even if damaged, contains language priors of the contents within the images and can guide the image inpainting. Meanwhile, the scene text image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInpainting
