M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot   Fine-grained Action Recognition

Hao Tang; Jun Liu; Shuanglin Yan; Rui Yan; Zechao Li; Jinhui Tang

arXiv:2308.03063·cs.CV·August 8, 2023·2 cites

M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition

Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, Jinhui Tang

PDF

Open Access

TL;DR

M$^3$Net is a novel framework for few-shot fine-grained action recognition that leverages multi-view encoding, matching, and fusion to improve detail capture and classification accuracy in limited data scenarios.

Contribution

It introduces a multi-view approach combining encoding, matching, and fusion to enhance fine-grained action recognition with few labeled samples.

Findings

01

Achieves state-of-the-art results on three benchmarks.

02

Effectively captures subtle action details.

03

Demonstrates superior generalization with limited data.

Abstract

Due to the scarcity of manually annotated data required for fine-grained video understanding, few-shot fine-grained (FS-FG) action recognition has gained significant attention, with the aim of classifying novel fine-grained action categories with only a few labeled instances. Despite the progress made in FS coarse-grained action recognition, current approaches encounter two challenges when dealing with the fine-grained action categories: the inability to capture subtle action details and the insufficiency of learning from limited data that exhibit high intra-class variance and inter-class similarity. To address these limitations, we propose M $^{3}$ Net, a matching-based framework for FS-FG action recognition, which incorporates \textit{multi-view encoding}, \textit{multi-view matching}, and \textit{multi-view fusion} to facilitate embedding encoding, similarity matching, and decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Medical Imaging and Analysis