Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Toby Perrett; Alessandro Masullo; Tilo Burghardt; Majid; Mirmehdi; Dima Damen

arXiv:2101.06184·cs.CV·March 30, 2021

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Toby Perrett, Alessandro Masullo, Tilo Burghardt, Majid, Mirmehdi, Dima Damen

PDF

2 Repos

TL;DR

This paper introduces Temporal-Relational CrossTransformers (TRX), a novel method for few-shot action recognition that models temporal relations between frames, achieving state-of-the-art results on multiple datasets.

Contribution

The paper presents a new approach using CrossTransformer attention to construct class prototypes from temporal frame tuples, improving few-shot action recognition performance.

Findings

01

Achieves state-of-the-art results on Kinetics, SSv2, HMDB51, UCF101

02

Outperforms prior work on SSv2 by 12%

03

Highlights the importance of modeling temporal relations

Abstract

We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared. Our proposed Temporal-Relational CrossTransformers (TRX) achieve state-of-the-art results on few-shot splits of Kinetics, Something-Something V2 (SSv2), HMDB51 and UCF101. Importantly, our method outperforms prior work on SSv2 by a wide margin (12%) due to the its ability to model temporal relations. A detailed ablation showcases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCrossTransformers