Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis

Furui Cheng; Vil\'em Zouhar; Robin Shing Moon Chan; Daniel F\"urst; Hendrik Strobelt; Mennatallah El-Assady

arXiv:2405.00708·cs.CL·August 8, 2025·3 cites

Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis

Furui Cheng, Vil\'em Zouhar, Robin Shing Moon Chan, Daniel F\"urst, Hendrik Strobelt, Mennatallah El-Assady

PDF

Open Access

TL;DR

This paper introduces LLM Analyzer, an interactive system for exploring large language model behaviors through efficient counterfactual generation and analysis, emphasizing human-in-the-loop understanding.

Contribution

It presents a novel interactive visualization system with a new algorithm for generating meaningful counterfactuals to improve LLM interpretability.

Findings

01

System enables intuitive exploration of LLM behaviors.

02

Counterfactuals are fluent and semantically meaningful.

03

User study confirms system's usability and effectiveness.

Abstract

Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Semantic Web and Ontologies · Artificial Intelligence in Law

MethodsCounterfactuals Explanations