Latent Concept-based Explanation of NLP Models
Xuemin Yu, Fahim Dalvi, Nadir Durrani, Marzia Nouri, Hassan Sajjad

TL;DR
This paper introduces LACOAT, a method that explains NLP model predictions by mapping input words into a latent space to reveal context-dependent facets, offering more informative insights than traditional word-based explanations.
Contribution
The paper proposes a novel latent concept-based explanation method for NLP models, addressing limitations of word-based explanations by capturing context-dependent facets of words.
Findings
LACOAT provides more nuanced explanations of model predictions.
The method effectively captures multiple facets of words based on context.
It enhances interpretability of NLP models through latent space analysis.
Abstract
Interpreting and understanding the predictions made by deep learning models poses a formidable challenge due to their inherently opaque nature. Many previous efforts aimed at explaining these predictions rely on input features, specifically, the words within NLP models. However, such explanations are often less informative due to the discrete nature of these words and their lack of contextual verbosity. To address this limitation, we introduce the Latent Concept Attribution method (LACOAT), which generates explanations for predictions based on latent concepts. Our foundational intuition is that a word can exhibit multiple facets, contingent upon the context in which it is used. Therefore, given a word in context, the latent space derived from our training process reflects a specific facet of that word. LACOAT functions by mapping the representations of salient input words into the…
Peer Reviews
Decision·Submitted to ICLR 2024
- Explaining LLM predictions with concepts and natural language is an interesting research direction which is beneficial to broader users of NLP systems. - Break-down evaluation of each component in the proposed method in Sec. 3.3 is useful.
**In the current state, the most significant weakness of the paper is the experiments.** - The paper lacks comparison to other explanation algorithms in the experiments. - The quality of the generated natural language explanation is evaluated with case studies only. I understand that evaluation of explanation-based algorithms are tricky, especially for natural language explanations that the authors study. To evaluate utility of explanations, the authors can perform human evaluation. Here I su
1) The motivation is sound. I understand and buy the fact that we need our models to be able to explain predictions for numerous reasons and having a way to do this via the latent concepts the model has encoded is a great way to try and do this.
1) I think a major weakness of the overall method is poor scalability. Clustering at scale would be quite expensive and large pretrained datasets are in the 3T total token range (e.g. Dolma) which would definitely be infeasible. 2) I think the experimentation could be extended. POS and sentiment are very small and relatively simple. The types of models that you're fine-tuning are also not very broad (no decoder only models like GPT here). What happens when you evaluate on an NLI task? Could you
- S1. The paper raises an interesting question about ambiguous natural language explanation and tries to disambiguate the sense (latent concept). - S2. The paper provides examples and experiments.
- W1. The components are loosely connected, which isn't necessarily a bad thing by itself. However, each component is either simplistic or an existing approach. - W2. The evaluation is done with automatically generated labels, and in this particular case, they can be deceptive because of the last layer assumption used to generate them. Also, if they are the targets, one can just adopt the method used to generated labels to replace the proposed method. - W3. The evaluation is limited to simplisti
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
