Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

Linchao Zhu; Zhongwen Xu; Yi Yang

arXiv:1611.09053·cs.CV·November 29, 2016

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

Linchao Zhu, Zhongwen Xu, Yi Yang

PDF

Open Access

TL;DR

This paper introduces an unsupervised bidirectional multirate reconstruction method for temporal modeling in videos, effectively handling motion speed variations and improving performance in event detection and captioning tasks.

Contribution

It proposes a novel multirate visual recurrent model with bidirectional reconstruction for unsupervised temporal learning in videos, addressing motion speed variance.

Findings

01

Achieves 10.4% improvement in event detection on MEDTest-13

02

Sets new state-of-the-art in video captioning on YouTube2Text

03

Effective in modeling temporal information with untrimmed videos

Abstract

Despite the recent success of neural networks in image feature learning, a major problem in the video domain is the lack of sufficient labeled data for learning to model temporal information. In this paper, we propose an unsupervised temporal modeling method that learns from untrimmed videos. The speed of motion varies constantly, e.g., a man may run quickly or slowly. We therefore train a Multirate Visual Recurrent Model (MVRM) by encoding frames of a clip with different intervals. This learning process makes the learned model more capable of dealing with motion speed variance. Given a clip sampled from a video, we use its past and future neighboring clips as the temporal context, and reconstruct the two temporal transitions, i.e., present $\to$ past transition and present $\to$ future transition, reflecting the temporal information in different views. The proposed method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings