BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models

Lindia Tjuatja; Graham Neubig

arXiv:2506.02204·cs.CL·June 11, 2025

BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models

Lindia Tjuatja, Graham Neubig

PDF

Open Access

TL;DR

BehaviorBox is an automated methodology that identifies fine-grained, context-specific performance differences between language models by extracting meaningful features where one model outperforms another, aiding deeper understanding beyond traditional metrics.

Contribution

This work introduces BehaviorBox, a novel automated approach for discovering detailed performance differences between language models using performance-aware contextual embeddings.

Findings

01

Identifies specific contextual features where models differ in performance

02

Reveals insights not captured by corpus-level perplexity measures

03

Applies to various models, sizes, and training methods

Abstract

Language model evaluation is a daunting task: prompts are brittle, corpus-level perplexities are vague, and the choice of benchmarks are endless. Finding examples that show meaningful, generalizable differences between two LMs is crucial to understanding where one model succeeds and another fails. Can this process be done automatically? In this work, we propose methodology for automated comparison of language models that uses performance-aware contextual embeddings to find fine-grained features of text where one LM outperforms another. Our method, which we name BehaviorBox, extracts coherent features that demonstrate differences with respect to the ease of generation between two LMs. Specifically, BehaviorBox finds features that describe groups of words in fine-grained contexts, such as "conditional 'were' in the phrase 'if you were'" and "exclamation marks after emotional statements",…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Mental Health via Writing · Machine Learning in Healthcare