Pair-wise Layer Attention with Spatial Masking for Video Prediction

Ping Li; Chenhan Zhang; Zheng Yang; Xianghua Xu; Mingli Song

arXiv:2311.11289·cs.CV·November 21, 2023·1 cites

Pair-wise Layer Attention with Spatial Masking for Video Prediction

Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces the PLA-SM framework for video prediction, combining layer-wise semantic dependency enhancement and spatial feature masking to improve the quality of predicted frames by capturing detailed textures and spatiotemporal dynamics.

Contribution

The paper proposes a novel Pair-wise Layer Attention with Spatial Masking framework that enhances feature dependencies and utilizes spatial features more effectively for improved video prediction.

Findings

01

Outperforms existing methods on five benchmarks.

02

Enriches texture details in predicted frames.

03

Effectively captures spatiotemporal dynamics.

Abstract

Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlvccn/pla_sm_videopred
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Human Pose and Action Recognition