Generalized Attention Flow: Feature Attribution for Transformer Models   via Maximum Flow

Behrooz Azarkhalili; Maxwell Libbrecht

arXiv:2502.15765·cs.LG·February 25, 2025

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow

Behrooz Azarkhalili, Maxwell Libbrecht

PDF

Open Access 1 Video

TL;DR

This paper presents Generalized Attention Flow (GAF), a new feature attribution method for Transformer models that improves interpretability by integrating attention weights, gradients, and maximum flow concepts, outperforming existing techniques.

Contribution

GAF extends Attention Flow with a generalized Information Tensor, combining multiple factors to enhance feature attribution accuracy in Transformer models.

Findings

01

GAF outperforms state-of-the-art attribution methods on sequence classification tasks.

02

The method exhibits desirable theoretical properties for interpretability.

03

GAF provides more reliable model explanations across various evaluation settings.

Abstract

This paper introduces Generalized Attention Flow (GAF), a novel feature attribution method for Transformer-based models to address the limitations of current approaches. By extending Attention Flow and replacing attention weights with the generalized Information Tensor, GAF integrates attention weights, their gradients, the maximum flow problem, and the barrier method to enhance the performance of feature attributions. The proposed method exhibits key theoretical properties and mitigates the shortcomings of prior techniques that rely solely on simple aggregation of attention weights. Our comprehensive benchmarking on sequence classification tasks demonstrates that a specific variant of GAF consistently outperforms state-of-the-art feature attribution methods in most evaluation settings, providing a more reliable interpretation of Transformer model outputs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow· underline

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer