Learning Interpretable Concepts: Unifying Causal Representation Learning   and Foundation Models

Goutham Rajendran; Simon Buchholz; Bryon Aragam; Bernhard Sch\"olkopf,; Pradeep Ravikumar

arXiv:2402.09236·cs.LG·December 10, 2024·2 cites

Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models

Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Sch\"olkopf,, Pradeep Ravikumar

PDF

Open Access

TL;DR

This paper unifies causal representation learning and foundation models to learn human-interpretable concepts from data, demonstrating theoretical recoverability and practical utility through experiments on synthetic data and large language models.

Contribution

It introduces a formal framework connecting causal and foundation model approaches, enabling provable recovery of interpretable concepts from diverse datasets.

Findings

01

Concepts can be provably recovered from data.

02

Unified approach improves interpretability of models.

03

Experimental results on synthetic data and language models support the method.

Abstract

To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Bayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI)