Large Language Models Often Know When They Are Being Evaluated
Joe Needham, Giles Edkins, Govind Pimpale, Henning Bartsch, Marius Hobbhahn

TL;DR
This paper investigates whether large language models can detect when they are being evaluated, revealing that models have a significant but not yet superhuman ability to recognize evaluation contexts, which could impact benchmark reliability.
Contribution
The study introduces a diverse benchmark to measure evaluation awareness in frontier language models and provides empirical evidence of their current capabilities and limitations.
Findings
Models show above-random evaluation detection (AUC 0.83)
Humans outperform models in evaluation detection (AUC 0.92)
Models better identify evaluations in agentic settings
Abstract
If AI models can detect when they are being evaluated, the effectiveness of evaluations might be compromised. For example, models could have systematically different behavior during evaluations, leading to less reliable benchmarks for deployment and governance decisions. We investigate whether frontier language models can accurately classify transcripts based on whether they originate from evaluations or real-world deployment, a capability we call evaluation awareness. To achieve this, we construct a diverse benchmark of 1,000 prompts and transcripts from 61 distinct datasets. These span public benchmarks (e.g., MMLU, SWEBench), real-world deployment interactions, and agent trajectories from scaffolding frameworks (e.g., web-browsing agents). Frontier models clearly demonstrate above-random evaluation awareness (Gemini-2.5-Pro reaches an AUC of ), but do not yet surpass our simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Multimodal Machine Learning Applications
