Latent Concept-based Explanation of NLP Models

Xuemin Yu; Fahim Dalvi; Nadir Durrani; Marzia Nouri; Hassan Sajjad

arXiv:2404.12545·cs.CL·October 10, 2024

Latent Concept-based Explanation of NLP Models

Xuemin Yu, Fahim Dalvi, Nadir Durrani, Marzia Nouri, Hassan Sajjad

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces LACOAT, a method that explains NLP model predictions by mapping input words into a latent space to reveal context-dependent facets, offering more informative insights than traditional word-based explanations.

Contribution

The paper proposes a novel latent concept-based explanation method for NLP models, addressing limitations of word-based explanations by capturing context-dependent facets of words.

Findings

01

LACOAT provides more nuanced explanations of model predictions.

02

The method effectively captures multiple facets of words based on context.

03

It enhances interpretability of NLP models through latent space analysis.

Abstract

Interpreting and understanding the predictions made by deep learning models poses a formidable challenge due to their inherently opaque nature. Many previous efforts aimed at explaining these predictions rely on input features, specifically, the words within NLP models. However, such explanations are often less informative due to the discrete nature of these words and their lack of contextual verbosity. To address this limitation, we introduce the Latent Concept Attribution method (LACOAT), which generates explanations for predictions based on latent concepts. Our foundational intuition is that a word can exhibit multiple facets, contingent upon the context in which it is used. Therefore, given a word in context, the latent space derived from our training process reflects a specific facet of that word. LACOAT functions by mapping the representations of salient input words into the…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- Explaining LLM predictions with concepts and natural language is an interesting research direction which is beneficial to broader users of NLP systems. - Break-down evaluation of each component in the proposed method in Sec. 3.3 is useful.

Weaknesses

**In the current state, the most significant weakness of the paper is the experiments.** - The paper lacks comparison to other explanation algorithms in the experiments. - The quality of the generated natural language explanation is evaluated with case studies only. I understand that evaluation of explanation-based algorithms are tricky, especially for natural language explanations that the authors study. To evaluate utility of explanations, the authors can perform human evaluation. Here I su

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

1) The motivation is sound. I understand and buy the fact that we need our models to be able to explain predictions for numerous reasons and having a way to do this via the latent concepts the model has encoded is a great way to try and do this.

Weaknesses

1) I think a major weakness of the overall method is poor scalability. Clustering at scale would be quite expensive and large pretrained datasets are in the 3T total token range (e.g. Dolma) which would definitely be infeasible. 2) I think the experimentation could be extended. POS and sentiment are very small and relatively simple. The types of models that you're fine-tuning are also not very broad (no decoder only models like GPT here). What happens when you evaluate on an NLI task? Could you

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

- S1. The paper raises an interesting question about ambiguous natural language explanation and tries to disambiguate the sense (latent concept). - S2. The paper provides examples and experiments.

Weaknesses

- W1. The components are loosely connected, which isn't necessarily a bad thing by itself. However, each component is either simplistic or an existing approach. - W2. The evaluation is done with automatically generated labels, and in this particular case, they can be deceptive because of the last layer assumption used to generate them. Also, if they are the targets, one can just adopt the method used to generated labels to replace the proposed method. - W3. The evaluation is limited to simplisti

Code & Models

Repositories

xuemin-yu/eraser_movie_latentconcept
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies