TL;DR
This paper introduces a novel joint-based action recognition model that leverages a shared motion encoder, joint selection, and a contrastive loss, achieving state-of-the-art results on multiple datasets.
Contribution
The paper presents a new joint-based action recognition model with a joint selector, a contrastive loss, and geometry-aware augmentation, advancing the accuracy of joint-based methods.
Findings
Significant improvements over state-of-the-art on JHMDB, HMDB, Charades, AVA datasets.
Late fusion with RGB and Flow further enhances performance.
Outperforms baseline on Mimetics dataset with out-of-context actions.
Abstract
Recent progress on action recognition has mainly focused on RGB and optical flow features. In this paper, we approach the problem of joint-based action recognition. Unlike other modalities, constellation of joints and their motion generate models with succinct human motion information for activity recognition. We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We also propose a novel joint-contrastive loss that pulls together groups of joint features which convey the same action. We strengthen the joint-based representations by using a geometry-aware data augmentation technique which jitters pose heatmaps while retaining the dynamics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Pose and Joint-Aware Action Recognition· youtube
