COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks
Fanny Jourdan, Agustin Picard, Thomas Fel, Laurent Risser, Jean Michel, Loubes, Nicholas Asher

TL;DR
COCKATIEL is a novel, post-hoc, concept-based explainability method for NLP classifiers that uses NMF and sensitivity analysis to generate faithful, human-aligned explanations without retraining models.
Contribution
It introduces COCKATIEL, a model-agnostic, concept-based XAI technique that improves interpretability of Transformer models in NLP tasks by discovering meaningful concepts and estimating their importance.
Findings
COCKATIEL effectively discovers human-aligned concepts in Transformer models.
It maintains model accuracy while providing explanations.
It demonstrates superior faithfulness and interpretability in sentiment analysis tasks.
Abstract
Transformer architectures are complex and their use in NLP, while it has engendered many successes, makes their interpretability or explainability challenging. Recent debates have shown that attention maps and attribution methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this paper, we present some of their limitations and introduce COCKATIEL, which successfully addresses some of them. COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique that generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task by using Non-Negative Matrix Factorization (NMF) to discover the concepts the model leverages to make predictions and by exploiting a Sensitivity Analysis to estimate accurately the importance of each of these concepts for the model. It does so without compromising the accuracy of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Materials Science
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Adam · Position-Wise Feed-Forward Layer · Multi-Head Attention · Absolute Position Encodings · Softmax · Layer Normalization · Label Smoothing
