# Lightweight Network Architecture for Real-Time Action Recognition

**Authors:** Alexander Kozlov, Vadim Andronov, Yana Gritsenko

arXiv: 1905.08711 · 2019-05-22

## TL;DR

This paper introduces a lightweight, real-time human action recognition model called Video Transformer Network that achieves high accuracy and speed on standard datasets using only a CPU.

## Contribution

It presents a novel efficient CNN architecture for action recognition and a model distillation technique to improve accuracy with multiple modalities.

## Key findings

- Achieves 56 FPS inference on CPU
- Performs comparably to state-of-the-art methods
- Offers a favorable speed/accuracy trade-off

## Abstract

In this work we present a new efficient approach to Human Action Recognition called Video Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural Language Processing and applies them to video understanding. The proposed method allows us to create lightweight CNN models that achieve high accuracy and real-time speed using just an RGB mono camera and general purpose CPU. Furthermore, we explain how to improve accuracy by distilling from multiple models with different modalities into a single model. We conduct a comparison with state-of-the-art methods and show that our approach performs on par with most of them on famous Action Recognition datasets. We benchmark the inference time of the models using the modern inference framework and argue that our approach compares favorably with other methods in terms of speed/accuracy trade-off, running at 56 FPS on CPU. The models and the training code are available.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.08711/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1905.08711/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/1905.08711/full.md

---
Source: https://tomesphere.com/paper/1905.08711