ProPainter: Improving Propagation and Transformer for Video Inpainting
Shangchen Zhou, Chongyi Li, Kelvin C.K. Chan, Chen Change Loy

TL;DR
ProPainter introduces a dual-domain propagation and a mask-guided sparse Transformer to enhance video inpainting, achieving superior performance and efficiency over previous methods by better exploiting global correspondences and reducing redundancy.
Contribution
The paper presents a novel framework combining dual-domain propagation with an efficient sparse Transformer for improved video inpainting.
Findings
ProPainter outperforms prior methods by 1.46 dB PSNR.
Dual-domain propagation improves global correspondence accuracy.
Sparse Transformer reduces computational redundancy.
Abstract
Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms in video inpainting (VI). Despite the effectiveness of these components, they still suffer from some limitations that affect their performance. Previous propagation-based approaches are performed separately either in the image or feature domain. Global image propagation isolated from learning may cause spatial misalignment due to inaccurate optical flow. Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. Specifically, we introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Vision and Imaging
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Byte Pair Encoding · Label Smoothing · Dropout · Absolute Position Encodings · Layer Normalization · Adam
