LLMs can hide text in other text of the same length
Antonio Norelli, Michael Bronstein

TL;DR
This paper introduces Calgacus, a protocol enabling large language models to embed and extract hidden messages within coherent texts of the same length, raising concerns about trust and AI safety.
Contribution
The paper presents Calgacus, a novel method demonstrating how modest open-source LLMs can covertly hide and reveal messages within ordinary texts, challenging assumptions about AI transparency.
Findings
High-quality message encoding achievable with 8-billion-parameter LLMs
Encoding and decoding can be performed locally on a laptop in seconds
The protocol demonstrates a radical decoupling of text from authorial intent
Abstract
A meaningful text can be hidden inside another, completely different yet still coherent and plausible, text of the same length. For example, a tweet containing a harsh political critique could be embedded in a tweet that celebrates the same political leader, or an ordinary product review could conceal a secret manuscript. This uncanny state of affairs is now possible thanks to Large Language Models, and in this paper we present Calgacus, a simple and efficient protocol to achieve it. We show that even modest 8-billion-parameter open-source LLMs are sufficient to obtain high-quality results, and a message as long as this abstract can be encoded and decoded locally on a laptop in seconds. The existence of such a protocol demonstrates a radical decoupling of text from authorial intent, further eroding trust in written communication, already shaken by the rise of LLM chatbots. We illustrate…
Peer Reviews
Decision·ICLR 2026 Poster
The method is conceptually elegant. It shows that large language models can be used as full-capacity generative steganographic systems, producing natural-looking texts that conceal arbitrary content. The approach achieves one-to-one token correspondence between the hidden text and the generated text while maintaining coherence, which is interesting among existing steganography methods. The paper connects the technique to broader philosophical questions about language, intention, and meaning in
The experimental analysis is minimal. The evaluation relies mostly on qualitative examples and log-probability plots without systematic comparisons or quantitative metrics such as recoverability, perplexity degradation, or detectability. The proposed misuse scenarios, such as unaligned chatbots hidden within aligned ones, are speculative and not demonstrated experimentally.
1. The writing is clear, intuitive and intriguing. 2. The idea is simple yet effective. 3. Extensive analysis and discussions are provided, making it deep and insightful.
1. The novelty of the work is not very clear. Similar ideas have been explored in previous work and need to be better differentiated. 2. Some practical limitations. 3. Lack of robustness analysis in the adversarial scenario.
1. This paper offers a new perspective for LLM safety and alignment: a model that appears aligned on the surface may still harbor vulnerabilities that allow dangerous information to be hidden in its output probability distribution. 2. It introduces a remarkably simple, full-capacity method for embedding hidden text specifically designed for LLMs. 3. Due to the secret key prompt and the inherent chaos in LLM behavior, this approach is currently nearly impossible to detect without access to both t
1. The method is sensitive to the quality of the key prompt; a low-quality prompt may prevent the target probability ranks from forming a coherent and natural-looking cover text. 2. It is fragile to transmission errors; any corruption in the cover text will completely scramble the recovered probability rank sequence, making it unsuitable for noisy communication channels. 3. It imposes constraints on the secret text itself, which must lie within the model’s training domain, for example, rare dial
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Topic Modeling · Adversarial Robustness in Machine Learning
