Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification

Branislav Pecher; Michal Spiegel; Robert Belanec; Jan Cegin

arXiv:2602.04297·cs.CL·February 5, 2026

Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification

Branislav Pecher, Michal Spiegel, Robert Belanec, Jan Cegin

PDF

Open Access

TL;DR

This paper investigates how prompt underspecification affects the sensitivity of large language models in text classification, revealing that minimal prompts cause higher performance variance and that sensitivity largely emerges in final layers.

Contribution

It systematically compares underspecified and instructed prompts, showing that prompt underspecification significantly impacts performance variability and understanding of model sensitivity.

Findings

01

Underspecified prompts lead to higher performance variance.

02

Prompt underspecification affects final model layers more than internal representations.

03

Instruction prompts are more robust against sensitivity issues.

Abstract

Large language models (LLMs) are widely used as zero-shot and few-shot classifiers, where task behaviour is largely controlled through prompting. A growing number of works have observed that LLMs are sensitive to prompt variations, with small changes leading to large changes in performance. However, in many cases, the investigation of sensitivity is performed using underspecified prompts that provide minimal task instructions and weakly constrain the model's output space. In this work, we argue that a significant portion of the observed prompt sensitivity can be attributed to prompt underspecification. We systematically study and compare the sensitivity of underspecified prompts and prompts that provide specific instructions. Utilising performance analysis, logit analysis, and linear probing, we find that underspecified prompts exhibit higher performance variance and lower logit values…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education