Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions

Hidde Fokkema; Tim van Erven; Sara Magliacane

arXiv:2502.06536·stat.ML·October 24, 2025

Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions

Hidde Fokkema, Tim van Erven, Sara Magliacane

PDF

Open Access

TL;DR

This paper introduces a framework for learning interpretable concepts from high-dimensional data with theoretical guarantees, eliminating the need for interventions and reducing reliance on assumptions like concept independence.

Contribution

It proposes a novel causal representation learning approach that aligns latent causal variables with concepts using minimal labels, with proven theoretical guarantees and improved interpretability.

Findings

01

Learned concepts have fewer impurities and higher accuracy.

02

Framework works without interventions or strong assumptions.

03

Effective in synthetic and image benchmark datasets.

Abstract

Machine learning is a vital part of many real-world systems, but several concerns remain about the lack of interpretability, explainability and robustness of black-box AI systems. Concept Bottleneck Models (CBM) address some of these challenges by learning interpretable concepts from high-dimensional data, e.g. images, which are used to predict labels. An important issue in CBMs are spurious correlation between concepts, which effectively lead to learning "wrong" concepts. Current mitigating strategies have strong assumptions, e.g., they assume that the concepts are statistically independent of each other, or require substantial interaction in terms of both interventions and labels provided by annotators. In this paper, we describe a framework that provides theoretical guarantees on the correctness of the learned concepts and on the number of required labels, without requiring any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification