A Geometric Unification of Concept Learning with Concept Cones
Alexandre Rocchi--Henry, Thomas Fel, Gianni Franchi

TL;DR
This paper unifies concept learning paradigms by revealing their shared geometric structure as concept cones, and introduces metrics to evaluate how well unsupervised methods approximate human-defined concepts.
Contribution
It demonstrates that supervised and unsupervised concept learning methods share a geometric framework and proposes a containment-based evaluation to compare their learned concepts.
Findings
Both paradigms learn linear directions forming concept cones.
Metrics link inductive biases to concept emergence.
Optimal sparsity and expansion maximize alignment with human concepts.
Abstract
Two traditions of interpretability have evolved side by side but seldom spoken to each other: Concept Bottleneck Models (CBMs), which prescribe what a concept should be, and Sparse Autoencoders (SAEs), which discover what concepts emerge. While CBMs use supervision to align activations with human-labeled concepts, SAEs rely on sparse coding to uncover emergent ones. We show that both paradigms instantiate the same geometric structure: each learns a set of linear directions in activation space whose nonnegative combinations form a concept cone. Supervised and unsupervised methods thus differ not in kind but in how they select this cone. Building on this view, we propose an operational bridge between the two paradigms. CBMs provide human-defined reference geometries, while SAEs can be evaluated by how well their learned cones approximate or contain those of CBMs. This containment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference · Advanced Graph Neural Networks
