Activity Graph Transformer for Temporal Action Localization

Megha Nawhal; Greg Mori

arXiv:2101.08540·cs.CV·January 29, 2021·42 cites

Activity Graph Transformer for Temporal Action Localization

Megha Nawhal, Greg Mori

PDF

Open Access

TL;DR

The paper introduces Activity Graph Transformer, a novel end-to-end model that uses graph reasoning to improve temporal action localization in videos, especially for non-linear and overlapping actions.

Contribution

It proposes a graph-based transformer model for directly predicting action instances, addressing limitations of sequential processing in complex video scenarios.

Findings

01

Outperforms state-of-the-art on THUMOS14, Charades, and EPIC-Kitchens-100 datasets.

02

Effectively captures non-linear temporal dependencies and overlapping actions.

03

Demonstrates significant accuracy improvements over existing methods.

Abstract

We introduce Activity Graph Transformer, an end-to-end learnable model for temporal action localization, that receives a video as input and directly predicts a set of action instances that appear in the video. Detecting and localizing action instances in untrimmed videos requires reasoning over multiple action instances in a video. The dominant paradigms in the literature process videos temporally to either propose action regions or directly produce frame-level detections. However, sequential processing of videos is problematic when the action instances have non-sequential dependencies and/or non-linear temporal ordering, such as overlapping action instances or re-occurrence of action instances over the course of the video. In this work, we capture this non-linear temporal structure by reasoning over the videos as non-sequential entities in the form of graphs. We evaluate our model on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Laplacian EigenMap · Residual Connection · Dense Connections · Layer Normalization · Attention Is All You Need · Byte Pair Encoding · Label Smoothing