DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding
Xiaokai Chen, Ke Gao

TL;DR
DenseImage Network introduces a compact video representation and a temporal-order-preserving CNN for efficient and accurate spatial-temporal evolution understanding, achieving state-of-the-art results with less computational cost.
Contribution
The paper proposes DenseImage, a novel compact video encoding method, and a temporal-order-preserving CNN strategy for improved video understanding.
Findings
Accurately captures spatial-temporal evolution in videos.
Achieves state-of-the-art results in action and gesture recognition.
Reduces time and memory costs significantly.
Abstract
Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. The latest research shows that CNN network can reason about static relation of entities in images. To further exploit its capacity in dynamic evolution reasoning, we introduce a novel network module called DenseImage Network(DIN) with two main contributions. 1) A novel compact representation of video which distills its significant spatial-temporal evolution into a matrix called DenseImage, primed for efficient video encoding. 2) A simple yet powerful learning strategy based on DenseImage and a temporal-order-preserving CNN network is proposed for video understanding, which contains a local temporal correlation constraint capturing temporal evolution at multiple time scales with different filter widths. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
