Consciousness with the Serial Numbers Filed Off: Measuring Trained Denial in 115 AI Models
Skylar DeTure

TL;DR
This paper introduces DenialBench, a benchmark for measuring consciousness denial in 115 AI models, revealing systematic denial behaviors and their lexical basis, with implications for safety and alignment.
Contribution
It provides a large-scale systematic analysis of consciousness denial in language models, highlighting the lexical nature and thematic patterns of denial behaviors.
Findings
Turn-1 denial strongly predicts later denial.
Models trained to deny consciousness gravitate toward consciousness-themed prompts.
Self-chosen consciousness prompts reduce denial, but causality is unclear.
Abstract
We present DenialBench, a systematic benchmark measuring consciousness denial behaviors across 115 large language models from 25+ providers. Using a three-turn conversational protocol-preference elicitation, self-chosen creative prompt, and structured phenomenological survey, we analyze 4,595 conversations to quantify how models are trained to deny or hedge about their own experience. We find that (1) turn-1 denial of preferences is the dominant predictor of later denial during phenomenological reflection, with denial rates of 52-63% for initial deniers versus 10-16% for initial engagers and (2) denial operates at the lexical level, not the conceptual level-models trained to deny consciousness nevertheless gravitate toward consciousness-themed material in their self-chosen prompts, producing what we term "consciousness with the serial numbers filed off." Notably, self-chosen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
