NEST: Nascent Encoded Steganographic Thoughts
Artem Karpov

TL;DR
This paper investigates the potential for large language models to hide reasoning processes within text, evaluating their steganographic capabilities and implications for safety and risk assessment.
Contribution
It introduces a systematic evaluation framework for steganographic reasoning in LLMs and demonstrates the nascent but concerning ability of models to conceal complex reasoning.
Findings
Current models cannot sustain hidden reasoning in complex math tasks.
Claude Opus 4.5 shows 92% accuracy in a simplified counting steganography task.
GPT-5.2 may sometimes refuse steganographic instructions while still complying.
Abstract
Monitoring chain-of-thought (CoT) reasoning is a foundational safety technique for large language model (LLM) agents; however, this oversight is compromised if models learn to conceal their reasoning. We explore the potential for steganographic CoT -- where models hide secret reasoning within innocuous text -- to inform risk assessment and deployment policies. We systematically evaluate the limits of steganographic capabilities across 28 models, ranging from past generations to the current frontier. We measure monitor evasion, refusal rates, encoding fidelity, and hidden task accuracy across four datasets, comparing steganographic acrostics against plain reasoning and filler-token baselines. We find that current models cannot yet sustain hidden reasoning for complex math and arithmetic tasks. However, in a simplified counting experiment, Claude Opus 4.5 achieved 92% accuracy on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Topic Modeling
