Variational Language Concepts for Interpreting Foundation Language Models
Hengyi Wang, Shiwei Tan, Zhiqing Hong, Desheng Zhang, Hao Wang

TL;DR
This paper introduces a variational Bayesian framework called VALC that enhances interpretability of foundation language models by providing concept-level explanations beyond traditional attention-based word interpretations.
Contribution
The paper proposes a novel variational Bayesian method for higher-level concept interpretation of FLMs, addressing limitations of attention-based explanations.
Findings
VALC effectively identifies meaningful language concepts
The method outperforms attention-based interpretability approaches
Empirical results validate the interpretability of FLMs using VALC
Abstract
Foundation Language Models (FLMs) such as BERT and its variants have achieved remarkable success in natural language processing. To date, the interpretability of FLMs has primarily relied on the attention weights in their self-attention layers. However, these attention weights only provide word-level interpretations, failing to capture higher-level structures, and are therefore lacking in readability and intuitiveness. To address this challenge, we first provide a formal definition of conceptual interpretation and then propose a variational Bayesian framework, dubbed VAriational Language Concept (VALC), to go beyond word-level interpretations and provide concept-level interpretations. Our theoretical analysis shows that our VALC finds the optimal language concepts to interpret FLM predictions. Empirical results on several real-world datasets show that our method can successfully provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · WordPiece · Attention Dropout
