Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
Melkamu Abay Mersha, Jugal Kalita

TL;DR
The paper introduces CA-LIG, a hierarchical attribution framework that enhances the interpretability of Transformer models by integrating layer-wise relevance with context-aware, class-specific attention gradients, providing clearer and more faithful explanations.
Contribution
It proposes a novel unified hierarchical attribution method, CA-LIG, that captures relevance flow across layers and structural components, improving explainability of Transformer models.
Findings
CA-LIG produces more faithful attributions across diverse tasks.
It shows stronger sensitivity to contextual dependencies.
It yields clearer, more semantically coherent visualizations.
Abstract
Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions difficult to interpret. Existing explainability methods rely on final-layer attributions, capture either local token-level attributions or global attention patterns without unification, and lack context-awareness of inter-token dependencies and structural components. They also fail to capture how relevance evolves across layers and how structural components shape decision-making. To address these limitations, we proposed the \textbf{Context-Aware Layer-wise Integrated Gradients (CA-LIG) Framework}, a unified hierarchical attribution framework that computes layer-wise Integrated Gradients within each Transformer block and fuses these token-level attributions with class-specific attention gradients. This integration yields signed,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
