Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
Furui Cheng, Vil\'em Zouhar, Robin Shing Moon Chan, Daniel F\"urst, Hendrik Strobelt, Mennatallah El-Assady

TL;DR
This paper introduces LLM Analyzer, an interactive system for exploring large language model behaviors through efficient counterfactual generation and analysis, emphasizing human-in-the-loop understanding.
Contribution
It presents a novel interactive visualization system with a new algorithm for generating meaningful counterfactuals to improve LLM interpretability.
Findings
System enables intuitive exploration of LLM behaviors.
Counterfactuals are fluent and semantically meaningful.
User study confirms system's usability and effectiveness.
Abstract
Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Semantic Web and Ontologies · Artificial Intelligence in Law
MethodsCounterfactuals Explanations
