Compact Example-Based Explanations for Language Models

Loris Schoenegger; Benjamin Roth

arXiv:2601.03786·cs.CL·April 10, 2026

Compact Example-Based Explanations for Language Models

Loris Schoenegger, Benjamin Roth

PDF

3 Models 1 Datasets

TL;DR

This paper introduces a new relevance score for selecting training examples to explain language model outputs, demonstrating improved explanation quality over existing strategies.

Contribution

It proposes a novel, retraining-free relevance metric for selecting training data that enhances example-based explanations for language models.

Findings

01

The relevance score can predict whether examples support or undermine model predictions.

02

Common selection strategies often perform worse than random selection.

03

A balanced influence and representativeness strategy improves explanation quality.

Abstract

Training data influence estimation methods quantify the contribution of training documents to a model's output, making them a promising source of information for example-based explanations. As humans cannot interpret thousands of documents, only a small subset of the training data can be presented as an explanation. Although the choice of which documents to include directly affects explanation quality, previous evaluations of such systems have largely ignored any selection strategies. To address this, we propose a novel selection relevance score, a retraining-free metric that quantifies how useful a set of examples is for explaining a model's output. We validate this score through fine-tuning experiments, confirming that it can predict whether a set of examples supports or undermines the model's predictions. Using this metric, we further show that common selection strategies often…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

loris3/tulu-3-sft-olmo-2-mixture-0225-sample
dataset· 836 dl
836 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.