Towards Unifying Evaluation of Counterfactual Explanations: Leveraging   Large Language Models for Human-Centric Assessments

Marharyta Domnich; Julius V\"alja; Rasmus Moorits Veski; Giacomo; Magnifico; Kadi Tulver; Eduard Barbu; Raul Vicente

arXiv:2410.21131·cs.AI·April 23, 2025

Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments

Marharyta Domnich, Julius V\"alja, Rasmus Moorits Veski, Giacomo, Magnifico, Kadi Tulver, Eduard Barbu, Raul Vicente

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper proposes a unified, human-centric evaluation framework for counterfactual explanations using large language models, improving assessment accuracy and scalability.

Contribution

It introduces a diverse set of counterfactual scenarios, collects human ratings, and fine-tunes LLMs to predict human judgments, enhancing evaluation consistency.

Findings

01

LLMs achieved up to 63% accuracy in zero-shot evaluations.

02

Fine-tuned models reached 85% accuracy in predicting human ratings.

03

The approach improves comparability and scalability of explanation evaluations.

Abstract

As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in human reasoning, identify the minimal input changes needed to obtain a given output and, hence, are crucial for supporting decision-making. Despite their importance, the evaluation of these explanations often lacks grounding in user studies and remains fragmented, with existing metrics not fully capturing human perspectives. To address this challenge, we developed a diverse set of 30 counterfactual scenarios and collected ratings across 8 evaluation metrics from 206 respondents. Subsequently, we fine-tuned different Large Language Models (LLMs) to predict average or individual human judgment across these metrics. Our methodology allowed LLMs to achieve an accuracy of up to 63% in zero-shot evaluations and 85% (over a 3-classes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anitera/countereval
pytorchOfficial

Datasets

anitera/CounterEval
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling

MethodsSparse Evolutionary Training