Learning to Explain: Prototype-Based Surrogate Models for LLM Classification

Bowen Wei; Mehrdad Fazli; Ziwei Zhu

arXiv:2505.18970·cs.CL·June 3, 2025

Learning to Explain: Prototype-Based Surrogate Models for LLM Classification

Bowen Wei, Mehrdad Fazli, Ziwei Zhu

PDF

Open Access

TL;DR

ProtoSurE is a prototype-based surrogate framework that offers faithful, human-understandable explanations for LLMs, outperforming existing methods and requiring fewer training examples for effective interpretability.

Contribution

It introduces ProtoSurE, a novel interpretable surrogate model using sentence-level prototypes to explain LLM decisions faithfully and efficiently.

Findings

01

Outperforms state-of-the-art explanation methods across multiple datasets.

02

Demonstrates strong data efficiency with fewer training examples.

03

Provides human-understandable explanations aligned with LLM reasoning.

Abstract

Large language models (LLMs) have demonstrated impressive performance on natural language tasks, but their decision-making processes remain largely opaque. Existing explanation methods either suffer from limited faithfulness to the model's reasoning or produce explanations that humans find difficult to understand. To address these challenges, we propose \textbf{ProtoSurE}, a novel prototype-based surrogate framework that provides faithful and human-understandable explanations for LLMs. ProtoSurE trains an interpretable-by-design surrogate model that aligns with the target LLM while utilizing sentence-level prototypes as human-understandable concepts. Extensive experiments show that ProtoSurE consistently outperforms SOTA explanation methods across diverse LLMs and datasets. Importantly, ProtoSurE demonstrates strong data efficiency, requiring relatively few training examples to achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques