Learning Comprehensive Motion Representation for Action Recognition
Mingyu Wu, Boyuan Jiang, Donghao Luo, Junchi Yan, Yabiao Wang, Ying, Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Xiaokang Yang

TL;DR
This paper introduces a novel comprehensive motion representation method for action recognition that enhances feature extraction by emphasizing dynamic channels and motion regions, leading to improved performance on multiple datasets.
Contribution
The paper proposes Channel-wise and Spatial-wise Motion Enhancement modules integrated into 2D CNNs, capturing richer motion cues with physical interpretability for better action recognition.
Findings
Outperforms state-of-the-art on Something-Something V1 & V2 datasets.
Achieves competitive results on Kinetics-400.
Improves temporal reasoning accuracy with 16-frame input.
Abstract
For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency. Moreover, the feature enhancement is often only performed by channel or space dimension in action recognition. To address these issues, we first devise a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector. The channel gates generated by CME incorporate the information from all the other frames in the video. We further propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsConvolution
