On Measuring Faithfulness or Self-consistency of Natural Language   Explanations

Letitia Parcalabescu; Anette Frank

arXiv:2311.07466·cs.CL·September 20, 2024·2 cites

On Measuring Faithfulness or Self-consistency of Natural Language Explanations

Letitia Parcalabescu, Anette Frank

PDF

Open Access 2 Repos 2 Videos

TL;DR

This paper argues that current faithfulness tests for LLM explanations actually measure self-consistency, introduces a new benchmark and a fine-grained measure CC-SHAP to better assess model faithfulness.

Contribution

It clarifies the distinction between faithfulness and self-consistency, and introduces CC-SHAP, a detailed measure for analyzing LLM self-consistency and explanation faithfulness.

Findings

01

Existing tests mainly measure self-consistency, not true faithfulness.

02

The Comparative Consistency Bank compares tests across 11 LLMs and 5 tasks.

03

CC-SHAP provides a detailed analysis of input contribution to answers and explanations.

Abstract

Large language models (LLMs) can explain their predictions through post-hoc or Chain-of-Thought (CoT) explanations. But an LLM could make up reasonably sounding explanations that are unfaithful to its underlying reasoning. Recent work has designed tests that aim to judge the faithfulness of post-hoc or CoT explanations. In this work we argue that these faithfulness tests do not measure faithfulness to the models' inner workings -- but rather their self-consistency at output level. Our contributions are three-fold: i) We clarify the status of faithfulness tests in view of model explainability, characterising them as self-consistency tests instead. This assessment we underline by ii) constructing a Comparative Consistency Bank for self-consistency tests that for the first time compares existing tests on a common suite of 11 open LLMs and 5 tasks -- including iii) our new self-consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

[Own work] On Measuring Faithfulness or Self-consistency of Natural Language Explanations· youtube

On Measuring Faithfulness or Self-consistency of Natural Language Explanations· underline

Taxonomy

TopicsTopic Modeling