HAT: History-Augmented Anchor Transformer for Online Temporal Action   Localization

Sakib Reza; Yuexi Zhang; Mohsen Moghaddam; Octavia Camps

arXiv:2408.06437·cs.CV·August 14, 2024

HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization

Sakib Reza, Yuexi Zhang, Mohsen Moghaddam, Octavia Camps

PDF

Open Access 1 Repo

TL;DR

The paper introduces HAT, a transformer-based framework that incorporates historical context to improve online temporal action localization, especially in procedural and egocentric videos, outperforming existing methods.

Contribution

It presents a novel History-Augmented Anchor Transformer that effectively combines long-term and short-term information for better action localization.

Findings

01

Outperforms state-of-the-art on PREGO datasets

02

Achieves comparable performance on non-PREGO datasets

03

Highlights the importance of long-term history in egocentric scenarios

Abstract

Online video understanding often relies on individual frames, leading to frame-by-frame predictions. Recent advancements such as Online Temporal Action Localization (OnTAL), extend this approach to instance-level predictions. However, existing methods mainly focus on short-term context, neglecting historical information. To address this, we introduce the History-Augmented Anchor Transformer (HAT) Framework for OnTAL. By integrating historical context, our framework enhances the synergy between long-term and short-term information, improving the quality of anchor features crucial for classification and localization. We evaluate our model on both procedural egocentric (PREGO) datasets (EGTEA and EPIC) and standard non-PREGO OnTAL datasets (THUMOS and MUSES). Results show that our model outperforms state-of-the-art approaches significantly on PREGO datasets and achieves comparable or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sakibreza/eccv24-hat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications

MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections