TL;DR
This paper introduces ConSim, an automated framework using large language models to evaluate the effectiveness of concept-based explanations in AI models by simulating human understanding and communication.
Contribution
We propose a novel evaluation framework for concept explanations that combines concept quality and interpretability using LLMs as simulators, enabling scalable and consistent assessment.
Findings
LLMs provide reliable rankings of explanation methods.
The framework enables end-to-end evaluation of concept explanations.
Our empirical study demonstrates the effectiveness of the proposed approach.
Abstract
Concept-based explanations work by mapping complex model computations to human-understandable concepts. Evaluating such explanations is very difficult, as it includes not only the quality of the induced space of possible concepts but also how effectively the chosen concepts are communicated to users. Existing evaluation metrics often focus solely on the former, neglecting the latter. We introduce an evaluation framework for measuring concept explanations via automated simulatability: a simulator's ability to predict the explained model's outputs based on the provided explanations. This approach accounts for both the concept space and its interpretation in an end-to-end evaluation. Human studies for simulatability are notoriously difficult to enact, particularly at the scale of a wide, comprehensive empirical evaluation (which is the subject of this work). We propose using large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
