Explaining black box text modules in natural language with language   models

Chandan Singh; Aliyah R. Hsu; Richard Antonello; Shailee Jain,; Alexander G. Huth; Bin Yu; Jianfeng Gao

arXiv:2305.09863·cs.AI·November 16, 2023·6 cites

Explaining black box text modules in natural language with language models

Chandan Singh, Aliyah R. Hsu, Richard Antonello, Shailee Jain,, Alexander G. Huth, Bin Yu, Jianfeng Gao

PDF

Open Access 2 Repos

TL;DR

This paper introduces SASC, a method for automatically generating natural language explanations for black box text modules, including neural network components and brain regions, enhancing interpretability across AI and neuroscience.

Contribution

The paper presents SASC, a novel approach that provides natural language explanations and reliability scores for black box text modules, applicable to models and brain data.

Findings

01

SASC often recovers ground truth explanations in synthetic modules.

02

It enables inspection of internal modules within BERT.

03

It can generate explanations for individual fMRI voxels' responses.

Abstract

Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. "Black box" indicates that we only have access to the module's inputs/outputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation is. We study SASC in 3 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Explainable Artificial Intelligence (XAI) · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Adam · Linear Warmup With Linear Decay · Softmax · Layer Normalization · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?