Decoupled Spatial-Temporal Transformer for Video Inpainting

Rui Liu; Hanming Deng; Yangyi Huang; Xiaoyu Shi; Lewei Lu; Wenxiu Sun,; Xiaogang Wang; Jifeng Dai; Hongsheng Li

arXiv:2104.06637·cs.CV·April 15, 2021·47 cites

Decoupled Spatial-Temporal Transformer for Video Inpainting

Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun,, Xiaogang Wang, Jifeng Dai, Hongsheng Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Decoupled Spatial-Temporal Transformer (DSTT) that improves video inpainting by separately attending to spatial textures and temporal object movements, achieving higher quality results with greater efficiency.

Contribution

The paper proposes a novel DSTT architecture that disentangles spatial and temporal attention, enhancing inpainting quality and computational efficiency over existing Transformer-based methods.

Findings

01

Outperforms state-of-the-art video inpainting methods

02

Achieves higher efficiency with reduced computational cost

03

Produces more plausible and temporally-coherent inpainted videos

Abstract

Video inpainting aims to fill the given spatiotemporal holes with realistic appearance but is still a challenging task even with prosperous deep learning approaches. Recent works introduce the promising Transformer architecture into deep video inpainting and achieve better performance. However, it still suffers from synthesizing blurry texture as well as huge computational cost. Towards this end, we propose a novel Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting with exceptional efficiency. Our proposed DSTT disentangles the task of learning spatial-temporal attention into 2 sub-tasks: one is for attending temporal object movements on different frames at same spatial locations, which is achieved by temporally-decoupled Transformer block, and the other is for attending similar background textures on same frame of all spatial positions, which is achieved by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruiliu-ai/DSTT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Inpainting · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Dropout · Adam