LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals

Gilat Toker; Nitay Calderon; Ohad Amosy; Roi Reichart

arXiv:2601.10700·cs.CL·January 21, 2026

LIBERTy: A Causal Framework for Benchmarking Concept-Based Explanations of LLMs with Structural Counterfactuals

Gilat Toker, Nitay Calderon, Ohad Amosy, Roi Reichart

PDF

Open Access

TL;DR

LIBERTy introduces a novel framework for benchmarking concept-based explanations of large language models using structural counterfactuals derived from explicit causal models, enabling more faithful and systematic evaluation.

Contribution

The paper presents LIBERTy, a framework and datasets for constructing structural counterfactuals in text generation, improving the evaluation of concept-based explanations for LLMs.

Findings

01

Existing explanations have significant room for improvement.

02

Proprietary LLMs show reduced sensitivity to demographic concepts.

03

LIBERTy enables systematic analysis of model sensitivity to interventions.

Abstract

Concept-based explanations quantify how high-level concepts (e.g., gender or experience) influence model behavior, which is crucial for decision-makers in high-stakes domains. Recent work evaluates the faithfulness of such explanations by comparing them to reference causal effects estimated from counterfactuals. In practice, existing benchmarks rely on costly human-written counterfactuals that serve as an imperfect proxy. To address this, we introduce a framework for constructing datasets containing structural counterfactual pairs: LIBERTy (LLM-based Interventional Benchmark for Explainability with Reference Targets). LIBERTy is grounded in explicitly defined Structured Causal Models (SCMs) of the text generation, interventions on a concept propagate through the SCM until an LLM generates the counterfactual. We introduce three datasets (disease detection, CV screening, and workplace…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications