Learning Interpretable Concept-Based Models with Human Feedback
Isaac Lage, Finale Doshi-Velez

TL;DR
This paper introduces a method for learning interpretable, concept-based models from high-dimensional tabular data using user-labeled features, improving transparency and efficiency over traditional instance-labeling approaches.
Contribution
The authors propose a novel approach that learns transparent concept definitions through user-labeled features, enhancing interpretability and efficiency in high-dimensional data settings.
Findings
Concept-based models with feature labeling outperform instance-labeling methods in learning ground truth concepts.
The approach maintains high predictive performance while improving interpretability.
Demonstrated effectiveness in real and clinical datasets with simulated user feedback.
Abstract
Machine learning models that first learn a representation of a domain in terms of human-understandable concepts, then use it to make predictions, have been proposed to facilitate interpretation and interaction with models trained on high-dimensional data. However these methods have important limitations: the way they define concepts are not inherently interpretable, and they assume that concept labels either exist for individual instances or can easily be acquired from users. These limitations are particularly acute for high-dimensional tabular features. We propose an approach for learning a set of transparent concept definitions in high-dimensional tabular data that relies on users labeling concept features instead of individual instances. Our method produces concepts that both align with users' intuitive sense of what a concept means, and facilitate prediction of the downstream label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Machine Learning in Healthcare
