COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable   ELements for explaining neural net classifiers on NLP tasks

Fanny Jourdan; Agustin Picard; Thomas Fel; Laurent Risser; Jean Michel; Loubes; Nicholas Asher

arXiv:2305.06754·cs.CL·June 26, 2023·1 cites

COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks

Fanny Jourdan, Agustin Picard, Thomas Fel, Laurent Risser, Jean Michel, Loubes, Nicholas Asher

PDF

Open Access 1 Repo

TL;DR

COCKATIEL is a novel, post-hoc, concept-based explainability method for NLP classifiers that uses NMF and sensitivity analysis to generate faithful, human-aligned explanations without retraining models.

Contribution

It introduces COCKATIEL, a model-agnostic, concept-based XAI technique that improves interpretability of Transformer models in NLP tasks by discovering meaningful concepts and estimating their importance.

Findings

01

COCKATIEL effectively discovers human-aligned concepts in Transformer models.

02

It maintains model accuracy while providing explanations.

03

It demonstrates superior faithfulness and interpretability in sentiment analysis tasks.

Abstract

Transformer architectures are complex and their use in NLP, while it has engendered many successes, makes their interpretability or explainability challenging. Recent debates have shown that attention maps and attribution methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this paper, we present some of their limitations and introduce COCKATIEL, which successfully addresses some of them. COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique that generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task by using Non-Negative Matrix Factorization (NMF) to discover the concepts the model leverages to make predictions and by exploiting a Sensitivity Analysis to estimate accurately the importance of each of these concepts for the model. It does so without compromising the accuracy of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fanny-jourdan/cockatiel
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Materials Science

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Adam · Position-Wise Feed-Forward Layer · Multi-Head Attention · Absolute Position Encodings · Softmax · Layer Normalization · Label Smoothing