TL;DR
This paper introduces a correlation-based feature representation for first-person activity recognition that effectively captures scene dynamics and motions, leading to improved classification performance on challenging datasets.
Contribution
It proposes a novel correlation-based method that models inter- and intra-time series relations for high-dimensional CNN features, enhancing activity recognition accuracy.
Findings
Outperforms state-of-the-art methods on two datasets
Efficiently captures scene dynamics and local motions
Produces highly discriminative features for linear classification
Abstract
In this paper, a simple yet efficient activity recognition method for first-person video is introduced. The proposed method is appropriate for representation of high-dimensional features such as those extracted from convolutional neural networks (CNNs). The per-frame (per-segment) extracted features are considered as a set of time series, and inter and intra-time series relations are employed to represent the video descriptors. To find the inter-time relations, the series are grouped and the linear correlation between each pair of groups is calculated. The relations between them can represent the scene dynamics and local motions. The introduced grouping strategy helps to considerably reduce the computational cost. Furthermore, we split the series in temporal direction in order to preserve long term motions and better focus on each local time window. In order to extract the cyclic motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
