Evaluating Readability and Faithfulness of Concept-based Explanations

Meng Li; Haoran Jin; Ruixuan Huang; Zhihao Xu; Defu Lian; Zijia Lin,; Di Zhang; Xiting Wang

arXiv:2404.18533·cs.AI·October 7, 2024

Evaluating Readability and Faithfulness of Concept-based Explanations

Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin,, Di Zhang, Xiting Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a formal framework for evaluating the faithfulness and readability of concept-based explanations in large language models, addressing challenges in their non-local and high-dimensional nature.

Contribution

It introduces a unified formalization of concepts, a perturbation-based faithfulness measure, and an automatic readability metric, along with a meta-evaluation method for explanation assessment.

Findings

01

Quantifies faithfulness through optimized perturbations in high-dimensional space.

02

Provides an automatic measure for readability based on pattern coherence.

03

Conducts extensive experiments to guide evaluation measure selection.

Abstract

With the growing popularity of general-purpose Large Language Models (LLMs), comes a need for more global explanations of model behaviors. Concept-based explanations arise as a promising avenue for explaining high-level patterns learned by LLMs. Yet their evaluation poses unique challenges, especially due to their non-local nature and high dimensional representation in a model's hidden space. Current methods approach concepts from different perspectives, lacking a unified formalization. This makes evaluating the core measures of concepts, namely faithfulness or readability, challenging. To bridge the gap, we introduce a formal definition of concepts generalizing to diverse concept-based explanations' settings. Based on this, we quantify the faithfulness of a concept explanation via perturbation. We ensure adequate perturbation in the high-dimensional space for different concepts via an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hr-jin/concept-explanation-evaluation
pytorchOfficial

Videos

Evaluating Readability and Faithfulness of Concept-based Explanations· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques