End-to-End Semantic Video Transformer for Zero-Shot Action Recognition

Keval Doshi; Yasin Yilmaz

arXiv:2203.05156·cs.CV·December 5, 2022·1 cites

End-to-End Semantic Video Transformer for Zero-Shot Action Recognition

Keval Doshi, Yasin Yilmaz

PDF

Open Access 1 Repo

TL;DR

This paper introduces an end-to-end transformer model for zero-shot video action recognition that captures long-range dependencies and outperforms existing methods on standard datasets, with a new setup ensuring proper zero-shot evaluation.

Contribution

It presents a novel transformer-based approach for zero-shot action recognition and a new experimental setup to properly evaluate zero-shot capabilities.

Findings

01

Outperforms state-of-the-art in zero-shot accuracy on UCF-101, HMDB-51, and ActivityNet

02

Efficiently captures long-range spatiotemporal dependencies

03

Provides a new standardized setup for zero-shot action recognition

Abstract

While video action recognition has been an active area of research for several years, zero-shot action recognition has only recently started gaining traction. In this work, we propose a novel end-to-end trained transformer model which is capable of capturing long range spatiotemporal dependencies efficiently, contrary to existing approaches which use 3D-CNNs. Moreover, to address a common ambiguity in the existing works about classes that can be considered as previously unseen, we propose a new experimentation setup that satisfies the zero-shot learning premise for action recognition by avoiding overlap between the training and testing classes. The proposed approach significantly outperforms the state of the arts in zero-shot action recognition in terms of the the top-1 accuracy on UCF-101, HMDB-51 and ActivityNet datasets. The code and proposed experimentation setup are available in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

secure-and-intelligent-systems-lab/semanticvideotransformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications