Harnessing Temporal Causality for Advanced Temporal Action Detection

Shuming Liu; Lin Sui; Chen-Lin Zhang; Fangzhou Mu; Chen Zhao; Bernard; Ghanem

arXiv:2407.17792·cs.CV·July 29, 2024

Harnessing Temporal Causality for Advanced Temporal Action Detection

Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard, Ghanem

PDF

Open Access 1 Repo

TL;DR

This paper introduces CausalTAD, a novel approach leveraging temporal causality in videos to improve action detection accuracy by focusing on past or future context, achieving state-of-the-art results.

Contribution

The paper proposes a causality-based method for temporal action detection, emphasizing causal attention to enhance model performance over existing techniques.

Findings

01

Achieved 1st place in multiple EPIC-Kitchens Challenge 2024 tracks.

02

Outperformed previous models on several benchmark datasets.

03

Demonstrated the effectiveness of causal modeling in temporal action detection.

Abstract

As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries. Over the years, various networks, including convolutions, graphs, and transformers, have been explored for effective temporal modeling for TAD. However, these modules typically treat past and future information equally, overlooking the crucial fact that changes in action boundaries are essentially causal events. Inspired by this insight, we propose leveraging the temporal causality of actions to enhance TAD representation by restricting the model's access to only past or future context. We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on multiple benchmarks. Notably, with CausalTAD, we ranked 1st in the Action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sming256/OpenTAD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces