Action Transformer: A Self-Attention Model for Short-Time Pose-Based   Human Action Recognition

Vittorio Mazzia; Simone Angarano; Francesco Salvetti; Federico; Angelini; Marcello Chiaberge

arXiv:2107.00606·cs.CV·January 11, 2022

Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Vittorio Mazzia, Simone Angarano, Francesco Salvetti, Federico, Angelini, Marcello Chiaberge

PDF

4 Repos

TL;DR

The paper introduces Action Transformer, a fully self-attentional model for real-time human action recognition using 2D pose data, outperforming complex architectures and establishing a new benchmark dataset.

Contribution

It presents a novel self-attentional architecture for HAR and introduces MPOSE2021, a large-scale dataset for benchmarking real-time action recognition.

Findings

01

Action Transformer outperforms existing models on MPOSE2021.

02

The approach achieves low latency and high accuracy in real-time HAR.

03

The dataset facilitates standardized evaluation for short-time HAR.

Abstract

Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mechanisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Layer Normalization · Dropout · Multi-Head Attention · Label Smoothing