Motion and Context-Aware Audio-Visual Conditioned Video Prediction

Yating Xu; Conghui Hu; Gim Hee Lee

arXiv:2212.04679·cs.CV·September 21, 2023

Motion and Context-Aware Audio-Visual Conditioned Video Prediction

Yating Xu, Conghui Hu, Gim Hee Lee

PDF

Open Access

TL;DR

This paper introduces a novel approach for audio-visual conditioned video prediction that decouples motion and appearance modeling, utilizing motion estimation and context-aware refinement to improve long-term prediction accuracy.

Contribution

The method separates motion and appearance modeling, incorporating motion-conditioned affine transformations and context-aware refinement for enhanced long-term video prediction.

Findings

01

Achieves competitive results on benchmark datasets.

02

Effectively models long-term video sequences.

03

Improves prediction quality by decoupling motion and appearance.

Abstract

The existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame. However, a direct inference of per-pixel intensity for the next visual frame is extremely challenging because of the high-dimensional image space. To this end, we decouple the audio-visual conditioned video prediction into motion and appearance modeling. The multimodal motion estimation predicts future optical flow based on the audio-motion correlation. The visual branch recalls from the motion memory built from the audio features to enable better long term prediction. We further propose context-aware refinement to address the diminishing of the global appearance context in the long-term continuous warping. The global appearance context is extracted by the context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image Enhancement Techniques · Image and Signal Denoising Methods

MethodsBalanced Selection