Crafting Large Language Models for Enhanced Interpretability

Chung-En Sun; Tuomas Oikarinen; Tsui-Wei Weng

arXiv:2407.04307·cs.CL·July 8, 2024·2 cites

Crafting Large Language Models for Enhanced Interpretability

Chung-En Sun, Tuomas Oikarinen, Tsui-Wei Weng

PDF

Open Access 1 Repo

TL;DR

This paper presents CB-LLM, a large language model designed with inherent interpretability through concept bottlenecks, improving transparency without sacrificing accuracy, and introduces ACC to close performance gaps with black-box models.

Contribution

The paper introduces CB-LLM, a novel interpretable LLM architecture with automatic concept correction, advancing transparency and scalability in language models.

Findings

01

CB-LLM achieves comparable accuracy to traditional LLMs.

02

Automatic Concept Correction improves interpretability without performance loss.

03

CB-LLM enhances transparency and scalability in language models.

Abstract

We introduce the Concept Bottleneck Large Language Model (CB-LLM), a pioneering approach to creating inherently interpretable Large Language Models (LLMs). Unlike traditional black-box LLMs that rely on post-hoc interpretation methods with limited neuron function insights, CB-LLM sets a new standard with its built-in interpretability, scalability, and ability to provide clear, accurate explanations. This innovation not only advances transparency in language models but also enhances their effectiveness. Our unique Automatic Concept Correction (ACC) strategy successfully narrows the performance gap with conventional black-box LLMs, positioning CB-LLM as a model that combines the high accuracy of traditional LLMs with the added benefit of clear interpretability -- a feature markedly absent in existing LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trustworthy-ml-lab/cb-llms
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques