End-to-End Learning of Motion Representation for Video Understanding

Lijie Fan; Wenbing Huang; Chuang Gan; Stefano Ermon; Boqing Gong,; Junzhou Huang

arXiv:1804.00413·cs.CV·April 3, 2018·31 cites

End-to-End Learning of Motion Representation for Video Understanding

Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong,, Junzhou Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TVNet, an end-to-end trainable neural network that learns optical-flow-like features directly from data, improving video understanding tasks by integrating feature extraction into the learning process.

Contribution

The paper proposes TVNet, a neural network that unrolls the TV-L1 optical flow optimization as layers, enabling end-to-end training and task-specific feature learning for video analysis.

Findings

01

TVNet outperforms existing methods on action recognition benchmarks.

02

It achieves comparable accuracy with faster feature extraction.

03

End-to-end training improves task-specific feature learning.

Abstract

Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations as neural layers. TVNet can therefore be used directly without any extra learning. Moreover, it can be naturally concatenated with other task-specific networks to formulate an end-to-end architecture, thus making our method more efficient than current multi-stage approaches by avoiding the need to pre-compute and store features on disk. Finally, the parameters of the TVNet can be further fine-tuned by end-to-end training. This enables TVNet to learn richer and task-specific patterns beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LijieFan/tvnet
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Diabetic Foot Ulcer Assessment and Management