Deficiency-Aware Masked Transformer for Video Inpainting

Yongsheng Yu; Heng Fan; Libo Zhang

arXiv:2307.08629·cs.CV·July 18, 2023·5 cites

Deficiency-Aware Masked Transformer for Video Inpainting

Yongsheng Yu, Heng Fan, Libo Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deficiency-aware masked transformer framework for video inpainting that effectively handles cases lacking cross-frame guidance by leveraging a dual-model approach, attention mechanisms, and contextual modules.

Contribution

The paper proposes a novel dual-modality-compatible inpainting framework with pretraining, selective self-attention, and a contextualizer to improve video inpainting, especially in deficiency scenarios.

Findings

01

DMT_vid outperforms previous methods on YouTube-VOS and DAVIS datasets.

02

Pretraining with DMT_img enhances hallucination in deficiency cases.

03

Selective attention accelerates inference and reduces noise.

Abstract

Recent video inpainting methods have made remarkable progress by utilizing explicit guidance, such as optical flow, to propagate cross-frame pixels. However, there are cases where cross-frame recurrence of the masked video is not available, resulting in a deficiency. In such situation, instead of borrowing pixels from other frames, the focus of the model shifts towards addressing the inverse problem. In this paper, we introduce a dual-modality-compatible inpainting framework called Deficiency-aware Masked Transformer (DMT), which offers three key advantages. Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases. Secondly, the self-attention module selectively incorporates spatiotemporal tokens to accelerate inference and remove noise signals. Thirdly, a simple yet effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yeates/dmt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Vision and Imaging

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization