Variational Language Concepts for Interpreting Foundation Language   Models

Hengyi Wang; Shiwei Tan; Zhiqing Hong; Desheng Zhang; Hao Wang

arXiv:2410.03964·cs.LG·October 30, 2024

Variational Language Concepts for Interpreting Foundation Language Models

Hengyi Wang, Shiwei Tan, Zhiqing Hong, Desheng Zhang, Hao Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variational Bayesian framework called VALC that enhances interpretability of foundation language models by providing concept-level explanations beyond traditional attention-based word interpretations.

Contribution

The paper proposes a novel variational Bayesian method for higher-level concept interpretation of FLMs, addressing limitations of attention-based explanations.

Findings

01

VALC effectively identifies meaningful language concepts

02

The method outperforms attention-based interpretability approaches

03

Empirical results validate the interpretability of FLMs using VALC

Abstract

Foundation Language Models (FLMs) such as BERT and its variants have achieved remarkable success in natural language processing. To date, the interpretability of FLMs has primarily relied on the attention weights in their self-attention layers. However, these attention weights only provide word-level interpretations, failing to capture higher-level structures, and are therefore lacking in readability and intuitiveness. To address this challenge, we first provide a formal definition of conceptual interpretation and then propose a variational Bayesian framework, dubbed VAriational Language Concept (VALC), to go beyond word-level interpretations and provide concept-level interpretations. Our theoretical analysis shows that our VALC finds the optimal language concepts to interpret FLM predictions. Empirical results on several real-world datasets show that our method can successfully provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wang-ML-Lab/interpretable-foundation-models
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · WordPiece · Attention Dropout