DeViT: Deformed Vision Transformers in Video Inpainting

Jiayin Cai; Changlin Li; Xin Tao; Chun Yuan; Yu-Wing Tai

arXiv:2209.13925·cs.CV·September 29, 2022

DeViT: Deformed Vision Transformers in Video Inpainting

Jiayin Cai, Changlin Li, Xin Tao, Chun Yuan, Yu-Wing Tai

PDF

TL;DR

DeViT introduces a novel video inpainting approach using deformable vision transformers with patch alignment, saliency-guided patch attention, and spatial-temporal weighting to improve accuracy in challenging scenes with motion and deformation.

Contribution

The paper presents DePtH, MPPA, and STA modules, advancing video inpainting by enhancing patch alignment, feature matching, and attention under deformation without extra supervision.

Findings

01

Outperforms recent methods qualitatively and quantitatively.

02

Achieves state-of-the-art results in video inpainting.

03

Effectively handles scenes with complex deformation and agile motion.

Abstract

This paper proposes a novel video inpainting method. We make three main contributions: First, we extended previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH), which improves patch-level feature alignments without additional supervision and benefits challenging scenes with various deformation. Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching by pruning out less essential features and using saliency map. MPPA enhances matching accuracy between warped tokens with invalid pixels. Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens under the guidance of the Deformation Factor learned from DePtH, especially for videos with agile motions. Experimental results demonstrate that our method outperforms recent methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Inpainting