Compound Prototype Matching for Few-shot Action Recognition

Yifei Huang; Lijin Yang; Yoichi Sato

arXiv:2207.05515·cs.CV·October 17, 2023

Compound Prototype Matching for Few-shot Action Recognition

Yifei Huang, Lijin Yang, Yoichi Sato

PDF

Open Access

TL;DR

This paper introduces a novel few-shot action recognition method that summarizes videos into compound prototypes, enabling effective comparison of actions with temporal variations and achieving state-of-the-art results.

Contribution

It proposes a new compound prototype matching approach that combines global and focused prototypes for improved few-shot action recognition.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Effectively handles temporal variations in action videos.

03

Demonstrates the superiority of compound prototypes over existing methods.

Abstract

Few-shot action recognition aims to recognize novel action classes using only a small number of labeled training samples. In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start/evolution of the action. Since no clear annotation is provided for the global prototypes, we use a group of focused prototypes to focus on certain timestamps in the video. We compare video similarity by matching the compound prototypes between the support and query videos. The global prototypes are directly matched to compare videos from the same perspective, for example, to compare whether two actions start similarly. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications