Temporal-Spatial Mapping for Action Recognition

Xiaolin Song; Cuiling Lan; Wenjun Zeng; Junliang Xing; Jingyu Yang and; Xiaoyan Sun

arXiv:1809.03669·cs.CV·September 12, 2018·1 cites

Temporal-Spatial Mapping for Action Recognition

Xiaolin Song, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jingyu Yang and, Xiaoyan Sun

PDF

Open Access

TL;DR

This paper introduces Temporal-Spatial Mapping (TSM), a novel method for capturing temporal and spatial dynamics in videos, leading to improved human action recognition performance.

Contribution

The paper proposes a new VideoMap representation and a shallow CNN with temporal attention, advancing video action recognition accuracy.

Findings

01

Achieves 4.2% higher accuracy than TSN on HMDB51.

02

Introduces a simple, effective operation for modeling temporal evolution.

03

Demonstrates state-of-the-art performance on benchmark dataset.

Abstract

Deep learning models have enjoyed great success for image related computer vision tasks like image classification and object detection. For video related tasks like human action recognition, however, the advancements are not as significant yet. The main challenge is the lack of effective and efficient models in modeling the rich temporal spatial information in a video. We introduce a simple yet effective operation, termed Temporal-Spatial Mapping (TSM), for capturing the temporal evolution of the frames by jointly analyzing all the frames of a video. We propose a video level 2D feature representation by transforming the convolutional features of all frames to a 2D feature map, referred to as VideoMap. With each row being the vectorized feature representation of a frame, the temporal-spatial features are compactly represented, while the temporal dynamic evolution is also well embedded.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications