TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot   Action Recognition

Mina Bishay; Georgios Zoumpourlis; Ioannis Patras

arXiv:1907.09021·cs.CV·July 23, 2019·34 cites

TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition

Mina Bishay, Georgios Zoumpourlis, Ioannis Patras

PDF

Open Access

TL;DR

TARN is a novel neural network that uses temporal attention and meta-learning to improve few-shot and zero-shot action recognition by comparing variable-length video representations without fine-tuning.

Contribution

Introduces a temporal attentive relation network with attention mechanisms and deep-distance learning for improved few-shot and zero-shot action recognition.

Findings

01

Outperforms state-of-the-art in few-shot action recognition

02

Achieves competitive results in zero-shot action recognition

03

Does not require fine-tuning or additional memory representations

Abstract

In this paper we propose a novel Temporal Attentive Relation Network (TARN) for the problems of few-shot and zero-shot action recognition. At the heart of our network is a meta-learning approach that learns to compare representations of variable temporal length, that is, either two videos of different length (in the case of few-shot action recognition) or a video and a semantic representation such as word vector (in the case of zero-shot action recognition). By contrast to other works in few-shot and zero-shot action recognition, we a) utilise attention mechanisms so as to perform temporal alignment, and b) learn a deep-distance measure on the aligned representations at video segment level. We adopt an episode-based training scheme and train our network in an end-to-end manner. The proposed method does not require any fine-tuning in the target domain or maintaining additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications