Concept frustration: Aligning human concepts and machine representations
Enrico Parisini, Christopher J. Soelistyo, Ahab Isaac, Alessandro Barp, Christopher R.S. Banerji

TL;DR
This paper introduces a geometric framework to detect and analyze concept frustration in AI models, improving alignment between human concepts and machine representations for safer, interpretable AI.
Contribution
It formalizes concept frustration, develops similarity measures to detect it, and demonstrates how addressing frustration enhances alignment in language and vision models.
Findings
Concept frustration can be detected in foundation model representations.
Incorporating frustrating concepts reorganizes learned concept geometry.
The framework aids in diagnosing incomplete concept ontologies.
Abstract
Aligning human-interpretable concepts with the internal representations learned by modern machine learning systems remains a central challenge for interpretable AI. We introduce a geometric framework for comparing supervised human concepts with unsupervised intermediate representations extracted from foundation model embeddings. Motivated by the role of conceptual leaps in scientific discovery, we formalise the notion of concept frustration: a contradiction that arises when an unobserved concept induces relationships between known concepts that cannot be made consistent within an existing ontology. We develop task-aligned similarity measures that detect concept frustration between supervised concept-based models and unsupervised representations derived from foundation models, and show that the phenomenon is detectable in task-aligned geometry while conventional Euclidean comparisons…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
