NUTA: Non-uniform Temporal Aggregation for Action Recognition
Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe

TL;DR
This paper introduces NUTA, a novel approach for action recognition that selectively aggregates features from the most informative video segments, improving accuracy by focusing on key temporal parts.
Contribution
The paper proposes NUTA, a non-uniform temporal aggregation method that learns to focus on important video segments and aligns features with traditional methods for enhanced recognition.
Findings
Achieved state-of-the-art results on four large-scale datasets.
Effectively identifies and focuses on the most relevant video segments.
Demonstrated improved recognition accuracy over uniform sampling methods.
Abstract
In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video. These methods typically uniformly sample a segment of an input clip (along the temporal dimension). However, not all parts of a video are equally important to determine the action in the clip. In this work, we focus instead on learning where to extract features, so as to focus on the most informative parts of the video. We propose a method called the non-uniform temporal aggregation (NUTA), which aggregates features only from informative temporal segments. We also introduce a synchronization method that allows our NUTA features to be temporally aligned with traditional uniformly sampled video features, so that both local and clip-level features can be combined. Our model has achieved state-of-the-art performance on four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
NUTA: Non-uniform Temporal Aggregation for Action Recognition· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
