Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure

Viliana Devbunova

arXiv:2603.19426·cs.CL·March 23, 2026

Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure

Viliana Devbunova

PDF

Open Access 1 Datasets

TL;DR

This paper critically examines whether probe-based methods truly measure evaluation awareness in language models or merely reflect prompt format and surface structure, revealing significant limitations in current evaluation techniques.

Contribution

It introduces a controlled experimental framework demonstrating that probe signals are heavily influenced by prompt structure, questioning the reliability of existing evaluation methods.

Findings

01

Probes mainly detect canonical prompt structures.

02

Probe signals do not generalize to free-form prompts.

03

Standard probe methods are limited in disentangling context from structure.

Abstract

Prior work uses linear probes on benchmark prompts as evidence of evaluation awareness in large language models. Because evaluation context is typically entangled with benchmark format and genre, it is unclear whether probe-based signals reflect context or surface structure. We test whether these signals persist under partial control of prompt format using a controlled 2x2 dataset and diagnostic rewrites. We find that probes primarily track benchmark-canonical structure and fail to generalize to free-form prompts independent of linguistic style. Thus, standard probe-based methodologies do not reliably disentangle evaluation context from structural artifacts, limiting the evidential strength of existing results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

viliana-dev/eval-awareness-2x2
dataset· 25 dl
25 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling