STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting   Transformer-based Video Models

Zerui Wang; Yan Liu

arXiv:2411.00630·cs.CV·November 4, 2024

STAA: Spatio-Temporal Attention Attribution for Real-Time Interpreting Transformer-based Video Models

Zerui Wang, Yan Liu

PDF

Open Access 1 Repo

TL;DR

STAA is a novel explainability method for video Transformer models that provides simultaneous spatial and temporal attributions with low computational cost, enabling real-time analysis.

Contribution

Introduces STAA, an XAI technique that offers combined spatial-temporal explanations from Transformer attention, optimized for real-time video analysis.

Findings

01

STAA produces more precise visual explanations.

02

Requires less than 3% of traditional XAI computational resources.

03

Effective in real-time video Transformer interpretability.

Abstract

Transformer-based models have achieved state-of-the-art performance in various computer vision tasks, including image and video analysis. However, Transformer's complex architecture and black-box nature pose challenges for explainability, a crucial aspect for real-world applications and scientific inquiry. Current Explainable AI (XAI) methods can only provide one-dimensional feature importance, either spatial or temporal explanation, with significant computational complexity. This paper introduces STAA (Spatio-Temporal Attention Attribution), an XAI method for interpreting video Transformer models. Differ from traditional methods that separately apply image XAI techniques for spatial features or segment contribution analysis for temporal aspects, STAA offers both spatial and temporal information simultaneously from attention values in Transformers. The study utilizes the Kinetics-400…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZeruiW/VideoXAI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dropout · Absolute Position Encodings