Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow
Behrooz Azarkhalili, Maxwell Libbrecht

TL;DR
This paper presents Generalized Attention Flow (GAF), a new feature attribution method for Transformer models that improves interpretability by integrating attention weights, gradients, and maximum flow concepts, outperforming existing techniques.
Contribution
GAF extends Attention Flow with a generalized Information Tensor, combining multiple factors to enhance feature attribution accuracy in Transformer models.
Findings
GAF outperforms state-of-the-art attribution methods on sequence classification tasks.
The method exhibits desirable theoretical properties for interpretability.
GAF provides more reliable model explanations across various evaluation settings.
Abstract
This paper introduces Generalized Attention Flow (GAF), a novel feature attribution method for Transformer-based models to address the limitations of current approaches. By extending Attention Flow and replacing attention weights with the generalized Information Tensor, GAF integrates attention weights, their gradients, the maximum flow problem, and the barrier method to enhance the performance of feature attributions. The proposed method exhibits key theoretical properties and mitigates the shortcomings of prior techniques that rely solely on simple aggregation of attention weights. Our comprehensive benchmarking on sequence classification tasks demonstrates that a specific variant of GAF consistently outperforms state-of-the-art feature attribution methods in most evaluation settings, providing a more reliable interpretation of Transformer model outputs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications
MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer
