An Empirical Study of End-to-End Temporal Action Detection

Xiaolong Liu; Song Bai; Xiang Bai

arXiv:2204.02932·cs.CV·April 7, 2022

An Empirical Study of End-to-End Temporal Action Detection

Xiaolong Liu, Song Bai, Xiang Bai

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper empirically evaluates end-to-end temporal action detection, demonstrating its advantages over head-only methods, analyzing design choices, and proposing a fast, high-performing baseline for future research.

Contribution

It systematically compares end-to-end and head-only learning, and develops a state-of-the-art, efficient baseline for temporal action detection.

Findings

01

End-to-end learning improves performance by up to 11%.

02

Mid-resolution baseline achieves state-of-the-art results.

03

End-to-end methods can be over 4 times faster than previous approaches.

Abstract

Temporal action detection (TAD) is an important yet challenging task in video understanding. It aims to simultaneously predict the semantic label and the temporal interval of every action instance in an untrimmed video. Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD. The effect of end-to-end learning is not systematically evaluated. Besides, there lacks an in-depth study on the efficiency-accuracy trade-off in end-to-end TAD. In this paper, we present an empirical study of end-to-end temporal action detection. We validate the advantage of end-to-end learning over head-only learning and observe up to 11\% performance improvement. Besides, we study the effects of multiple design choices that affect the TAD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xlliu7/E2E-TAD
pytorchOfficial

Models

🤗
phubinhdang/badminton-video-trimmer
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods