When Meaning Stays the Same, but Models Drift: Evaluating Quality of Service under Token-Level Behavioral Instability in LLMs

Xiao Li; Joel Kreuzwieser; Alan Peters

arXiv:2506.10095·cs.CL·June 13, 2025

When Meaning Stays the Same, but Models Drift: Evaluating Quality of Service under Token-Level Behavioral Instability in LLMs

Xiao Li, Joel Kreuzwieser, Alan Peters

PDF

Open Access 1 Repo

TL;DR

This paper introduces PBSS, a framework to measure how large language models' responses change with different token-level prompts that have the same meaning, revealing model-specific behavioral drift.

Contribution

The study presents a new diagnostic method for evaluating LLM stability under prompt rephrasing, highlighting the impact of tokenization and decoding on response consistency.

Findings

01

Model-specific response shifts under prompt variance

02

Statistical regularities linked to tokenization and decoding

03

Behavioral drift persists despite semantic equivalence

Abstract

We investigate how large language models respond to prompts that differ only in their token-level realization but preserve the same semantic intent, a phenomenon we call prompt variance. We propose Prompt-Based Semantic Shift (PBSS), a diagnostic framework for measuring behavioral drift in LLMs under semantically equivalent prompt rewordings. Applied to ten constrained tasks, PBSS reveals consistent, model-specific response shifts, suggesting statistical regularities linked to tokenization and decoding. These results highlight an overlooked dimension of model evaluation stability under rephrasing and suggest that tokenization strategies and decoding dynamics may contribute to post-training quality of service instability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xiao-Vandy/LLM-Prompt-Variance-Diagnostic-Analysis
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

Methodstravel james