TL;DR
This paper introduces a novel approach for few-shot fine-grained action recognition, combining a bidirectional attention module inspired by human vision with contrastive meta-learning to improve recognition of subtle, low inter-class variance actions.
Contribution
It proposes a bidirectional attention module for capturing subtle details and contrastive meta-learning for discriminative representations, advancing few-shot fine-grained action recognition.
Findings
Achieves state-of-the-art performance on two large-scale datasets.
Effectively captures subtle action details with the BAM module.
Improves discriminative power of video representations with CML.
Abstract
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited. Therefore, we propose the few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions: the inability to capture subtle action details and the inadequacy in learning from data with low inter-class variance. To tackle the first issue, a human vision inspired bidirectional attention module (BAM) is proposed. Combining top-down task-driven signals with bottom-up salient stimuli, BAM captures subtle action details by accurately highlighting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBottleneck Attention Module
