Towards Robust Metrics for Concept Representation Evaluation
Mateo Espinosa Zarlenga, Pietro Barbiero, Zohreh Shams, Dmitry, Kazhdan, Umang Bhatt, Adrian Weller, Mateja Jamnik

TL;DR
This paper introduces new metrics for evaluating the purity and robustness of concept representations in deep learning, addressing limitations of existing disentanglement metrics and providing better benchmarks for concept learning models.
Contribution
It proposes novel metrics tailored for concept learning, demonstrating their effectiveness over existing metrics and revealing insights about supervision and representation purity.
Findings
Existing disentanglement metrics are unsuitable for concept learning.
Proposed metrics better evaluate concept purity and robustness.
Supervision alone may not ensure pure concept representations.
Abstract
Recent work on interpretability has focused on concept-based explanations, where deep learning models are explained in terms of high-level units of information, referred to as concepts. Concept learning models, however, have been shown to be prone to encoding impurities in their representations, failing to fully capture meaningful features of their inputs. While concept learning lacks metrics to measure such phenomena, the field of disentanglement learning has explored the related notion of underlying factors of variation in the data, with plenty of metrics to measure the purity of such factors. In this paper, we show that such metrics are not appropriate for concept learning and propose novel metrics for evaluating the purity of concept representations in both approaches. We show the advantage of these metrics over existing ones and demonstrate their utility in evaluating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
