Clarity: The Flexibility-Interpretability Trade-Off in Sparsity-aware Concept Bottleneck Models
Konstantinos P. Panousis, Diego Marcos

TL;DR
This paper introduces Clarity, a new metric to evaluate the balance between performance and interpretability in sparsity-aware Concept Bottleneck Models, revealing a trade-off and aligning better with human trust.
Contribution
The work presents Clarity, a novel interpretability metric, and systematically analyzes how modeling choices affect semantic alignment and the interpretability-performance trade-off in CBMs.
Findings
Clarity correlates more strongly with human trust than standard metrics.
Different sparsity strategies exhibit distinct behaviors at similar performance levels.
There is a fundamental trade-off between model flexibility and semantic interpretability.
Abstract
The widespread adoption of deep learning models in computer vision has intensified concerns about interpretability. Despite strong performance, these models are often treated as black boxes, with limited systematic investigation of their decision-making processes. While many interpretability methods exist, objective evaluation of learned representations remains limited, particularly for approaches that rely on sparsity to "induce" interpretability. In this work, we investigate how modeling choices in Concept Bottleneck Models (CBMs) affect the semantic alignment of concept representations. We introduce Clarity, a novel metric that captures the interplay between downstream performance and the sparsity and precision of concept activations. Using an interpretability assessment framework grounded in datasets with ground-truth concept annotations, we evaluate both VLM- and attribute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
