Video Transformer Network

Daniel Neimark; Omri Bar; Maya Zohar; Dotan Asselmann

arXiv:2102.00719·cs.CV·August 18, 2021

Video Transformer Network

Daniel Neimark, Omri Bar, Maya Zohar, Dotan Asselmann

PDF

1 Repo

TL;DR

This paper introduces VTN, a transformer-based framework for video recognition that processes entire videos efficiently, achieving faster training and inference while maintaining competitive accuracy, and serving as a new baseline for future research.

Contribution

The paper proposes a generic transformer-based approach for video recognition that replaces 3D ConvNets, enabling faster training and inference with competitive accuracy.

Findings

01

Trains 16.1 times faster and runs 5.1 times faster during inference.

02

Requires 1.5 times fewer GFLOPs compared to other methods.

03

Achieves competitive results on Kinetics-400.

Abstract

This paper presents VTN, a transformer-based framework for video recognition. Inspired by recent developments in vision transformers, we ditch the standard approach in video action recognition that relies on 3D ConvNets and introduce a method that classifies actions by attending to the entire video sequence information. Our approach is generic and builds on top of any given 2D spatial network. In terms of wall runtime, it trains $16.1 \times$ faster and runs $5.1 \times$ faster during inference while maintaining competitive accuracy compared to other state-of-the-art methods. It enables whole video analysis, via a single end-to-end pass, while requiring $1.5 \times$ fewer GFLOPs. We report competitive results on Kinetics-400 and present an ablation study of VTN properties and the trade-off between accuracy and inference speed. We hope our approach will serve as a new baseline and start a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bomri/SlowFast/blob/master/projects/vtn/README.md
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.