Grouped Spatial-Temporal Aggregation for Efficient Action Recognition

Chenxu Luo; Alan Yuille

arXiv:1909.13130·cs.CV·October 1, 2019·21 cites

Grouped Spatial-Temporal Aggregation for Efficient Action Recognition

Chenxu Luo, Alan Yuille

PDF

Open Access 1 Repo

TL;DR

This paper introduces a grouped spatial-temporal aggregation method that efficiently decomposes features into static and dynamic cues, improving action recognition performance while reducing computational costs.

Contribution

The paper proposes a novel parallel decomposition of feature channels into spatial and temporal groups, enhancing efficiency and interpretability in action recognition models.

Findings

01

Effective in action recognition tasks requiring temporal reasoning

02

More parameter-efficient than previous methods

03

Enables analysis of spatial and temporal feature contributions

Abstract

Temporal reasoning is an important aspect of video analysis. 3D CNN shows good performance by exploring spatial-temporal features jointly in an unconstrained way, but it also increases the computational cost a lot. Previous works try to reduce the complexity by decoupling the spatial and temporal filters. In this paper, we propose a novel decomposition method that decomposes the feature channels into spatial and temporal groups in parallel. This decomposition can make two groups focus on static and dynamic cues separately. We call this grouped spatial-temporal aggregation (GST). This decomposition is more parameter-efficient and enables us to quantitatively analyze the contributions of spatial and temporal features in different layers. We verify our model on several action recognition tasks that require temporal reasoning and show its effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenxuluo/GST-video
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Gait Recognition and Analysis