Taylor saves for later: disentanglement for video prediction using   Taylor representation

Ting Pan; Zhuqing Jiang; Jianan Han; Shiping Wen; Aidong; Men; Haiying Wang

arXiv:2105.11062·cs.CV·May 25, 2021

Taylor saves for later: disentanglement for video prediction using Taylor representation

Ting Pan, Zhuqing Jiang, Jianan Han, Shiping Wen, Aidong, Men, Haiying Wang

PDF

Open Access

TL;DR

This paper introduces a novel two-branch deep model using Taylor series expansion for disentangling features in video prediction, improving long-term accuracy and robustness.

Contribution

The paper proposes a Taylor series-based recurrent module (TaylorCell) with TPU and MCU units for better long-term video prediction, a novel approach in the field.

Findings

01

Outperforms existing models on three datasets.

02

Effective in long-term video prediction.

03

Ablation studies confirm model components' importance.

Abstract

Video prediction is a challenging task with wide application prospects in meteorology and robot systems. Existing works fail to trade off short-term and long-term prediction performances and extract robust latent dynamics laws in video frames. We propose a two-branch seq-to-seq deep model to disentangle the Taylor feature and the residual feature in video frames by a novel recurrent prediction module (TaylorCell) and residual module. TaylorCell can expand the video frames' high-dimensional features into the finite Taylor series to describe the latent laws. In TaylorCell, we propose the Taylor prediction unit (TPU) and the memory correction unit (MCU). TPU employs the first input frame's derivative information to predict the future frames, avoiding error accumulation. MCU distills all past frames' information to correct the predicted Taylor feature from TPU. Correspondingly, the residual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition