Action-Guided Attention for Video Action Anticipation

Tsung-Ming Tai; Sofia Casarin; Andrea Pilzer; Werner Nutt; Oswald Lanz

arXiv:2603.01743·cs.CV·May 12, 2026

Action-Guided Attention for Video Action Anticipation

Tsung-Ming Tai, Sofia Casarin, Andrea Pilzer, Werner Nutt, Oswald Lanz

PDF

1 Video

TL;DR

This paper introduces Action-Guided Attention (AGA), a novel attention mechanism for video action anticipation that leverages predicted action sequences to improve generalization and interpretability.

Contribution

The paper proposes AGA, an attention method that explicitly uses predicted actions as queries and keys, enhancing sequence modeling and interpretability in video action anticipation.

Findings

01

AGA outperforms existing methods on EPIC-Kitchens-100.

02

The approach generalizes well to unseen test sets.

03

Post-training analysis reveals action dependencies and internalized evidence.

Abstract

Anticipating future actions in videos is challenging, as the observed frames provide only evidence of past activities, requiring the inference of latent intentions to predict upcoming actions. Existing transformer-based approaches, which rely on dot-product attention over pixel representations, often lack the high-level semantics necessary to model video sequences for effective action anticipation. As a result, these methods tend to overfit to explicit visual cues present in the past frames, limiting their ability to capture underlying intentions and degrading generalization to unseen samples. To address this, we propose Action-Guided Attention (AGA), an attention mechanism that explicitly leverages predicted action sequences as queries and keys to guide sequence modeling. Our approach fosters the attention module to emphasize relevant moments from the past based on the upcoming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Action-Guided Attention for Video Action Anticipation· slideslive