# STAR-Net: Action Recognition using Spatio-Temporal Activation   Reprojection

**Authors:** William McNally, Alexander Wong, John McPhee

arXiv: 1902.10024 · 2019-02-27

## TL;DR

STAR-Net introduces a novel spatio-temporal activation reprojection approach using 3D convolutions on pose estimation data, enabling efficient and high-performing action recognition with RGB cameras in small-scale settings.

## Contribution

The paper proposes the STAR framework that reprojects pose-based activations with 3D convolutions, reducing complexity and improving performance over existing methods.

## Key findings

- Outperforms methods using depth and inertial sensors on UTD-MHAD
- Effective in single-environment, small-scale applications
- Achieves high accuracy with lower network complexity

## Abstract

While depth cameras and inertial sensors have been frequently leveraged for human action recognition, these sensing modalities are impractical in many scenarios where cost or environmental constraints prohibit their use. As such, there has been recent interest on human action recognition using low-cost, readily-available RGB cameras via deep convolutional neural networks. However, many of the deep convolutional neural networks proposed for action recognition thus far have relied heavily on learning global appearance cues directly from imaging data, resulting in highly complex network architectures that are computationally expensive and difficult to train. Motivated to reduce network complexity and achieve higher performance, we introduce the concept of spatio-temporal activation reprojection (STAR). More specifically, we reproject the spatio-temporal activations generated by human pose estimation layers in space and time using a stack of 3D convolutions. Experimental results on UTD-MHAD and J-HMDB demonstrate that an end-to-end architecture based on the proposed STAR framework (which we nickname STAR-Net) is proficient in single-environment and small-scale applications. On UTD-MHAD, STAR-Net outperforms several methods using richer data modalities such as depth and inertial sensors.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.10024/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1902.10024/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1902.10024/full.md

---
Source: https://tomesphere.com/paper/1902.10024