OadTR: Online Action Detection with Transformers

Xiang Wang; Shiwei Zhang; Zhiwu Qing; Yuanjie Shao; Zhengrong Zuo,; Changxin Gao; Nong Sang

arXiv:2106.11149·cs.CV·June 22, 2021·1 cites

OadTR: Online Action Detection with Transformers

Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo,, Changxin Gao, Nong Sang

PDF

Open Access 1 Repo

TL;DR

OadTR introduces a Transformer-based encoder-decoder framework for online action detection, effectively capturing historical and future context to improve accuracy and speed over RNN-based methods.

Contribution

The paper presents a novel Transformer-based architecture for online action detection, addressing RNN limitations and enhancing performance on challenging datasets.

Findings

01

Outperforms state-of-the-art methods in mAP and mcAP

02

Achieves higher training and inference speeds

03

Effectively models historical and future context

Abstract

Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure. However, RNN suffers from non-parallelism and gradient vanishing, hence it is hard to be optimized. In this paper, we propose a new encoder-decoder framework based on Transformers, named OadTR, to tackle these problems. The encoder attached with a task token aims to capture the relationships and global interactions between historical observations. The decoder extracts auxiliary information by aggregating anticipated future clip representations. Therefore, OadTR can recognize current actions by encoding historical information and predicting future context simultaneously. We extensively evaluate the proposed OadTR on three challenging datasets: HDD, TVSeries, and THUMOS14. The experimental results show that OadTR achieves higher training and inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangxiang1230/OadTR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications