DenseImage Network: Video Spatial-Temporal Evolution Encoding and   Understanding

Xiaokai Chen; Ke Gao

arXiv:1805.07550·cs.CV·May 22, 2018·6 cites

DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding

Xiaokai Chen, Ke Gao

PDF

Open Access

TL;DR

DenseImage Network introduces a compact video representation and a temporal-order-preserving CNN for efficient and accurate spatial-temporal evolution understanding, achieving state-of-the-art results with less computational cost.

Contribution

The paper proposes DenseImage, a novel compact video encoding method, and a temporal-order-preserving CNN strategy for improved video understanding.

Findings

01

Accurately captures spatial-temporal evolution in videos.

02

Achieves state-of-the-art results in action and gesture recognition.

03

Reduces time and memory costs significantly.

Abstract

Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. The latest research shows that CNN network can reason about static relation of entities in images. To further exploit its capacity in dynamic evolution reasoning, we introduce a novel network module called DenseImage Network(DIN) with two main contributions. 1) A novel compact representation of video which distills its significant spatial-temporal evolution into a matrix called DenseImage, primed for efficient video encoding. 2) A simple yet powerful learning strategy based on DenseImage and a temporal-order-preserving CNN network is proposed for video understanding, which contains a local temporal correlation constraint capturing temporal evolution at multiple time scales with different filter widths. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning