TL;DR
This paper introduces a global transformer-based model for image inpainting that effectively captures texture and structure information across the entire image, outperforming existing methods.
Contribution
It proposes a novel global transformer architecture with encoder-decoder and structure-texture matching attention for improved inpainting quality.
Findings
Outperforms existing inpainting methods on benchmark datasets.
Effectively captures global texture and structure information.
Demonstrates superior inpainting results through extensive experiments.
Abstract
Image inpainting has achieved remarkable progress and inspired abundant methods, where the critical bottleneck is identified as how to fulfill the high-frequency structure and low-frequency texture information on the masked regions with semantics. To this end, deep models exhibit powerful superiority to capture them, yet constrained on the local spatial regions. In this paper, we delve globally into texture and structure information to well capture the semantics for image inpainting. As opposed to the existing arts trapped on the independent local patches, the texture information of each patch is reconstructed from all other patches across the whole image, to match the coarsely filled information, specially the structure information over the masked regions. Unlike the current decoder-only transformer within the pixel level for image inpainting, our model adopts the transformer pipeline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Attention Is All You Need · Inpainting · Adam · Softmax · Dropout · Residual Connection
