A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

Usman Anwar; Julianna Piskorz; David D. Baek; David Africa; Jim Weatherall; Max Tegmark; Christian Schroeder de Witt; Mihaela van der Schaar; David Krueger

arXiv:2602.23163·cs.AI·April 30, 2026

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall, Max Tegmark, Christian Schroeder de Witt, Mihaela van der Schaar, David Krueger

PDF

1 Datasets

TL;DR

This paper introduces a decision-theoretic framework to detect and quantify steganography in large language models by measuring asymmetries in information utility observable through agents' actions.

Contribution

It proposes a novel formalism using generalized $\

Findings

01

The formalism can detect steganographic reasoning in LLMs.

02

It quantifies the steganographic gap using a new information measure.

03

Empirical validation shows effectiveness in detection and mitigation.

Abstract

Large language models are beginning to show steganographic capabilities. Such capabilities could allow misaligned models to evade oversight mechanisms. Yet principled methods to detect and quantify such behaviours are lacking. Classical definitions of steganography, and detection methods based on them, require a known reference distribution of non-steganographic signals. For the case of steganographic reasoning in LLMs, knowing such a reference distribution is not feasible; this renders these approaches inapplicable. We propose an alternative, \textbf{decision-theoretic view of steganography}. Our central insight is that steganography creates an asymmetry in usable information between agents who can and cannot decode the hidden content (present within a steganographic signal), and this otherwise latent asymmetry can be inferred from the agents' observable actions. To formalise this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Complementarity/steganographic-collusion-detection
dataset· 236 dl
236 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.