TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent
Dominik Meier, Jan Philip Wahle, Paul R\"ottger, Terry Ruas, Bela Gipp

TL;DR
This paper introduces TrojanStego, a threat model where fine-tuned language models covertly leak sensitive information through linguistic steganography, demonstrating practical exfiltration with high accuracy and low detectability.
Contribution
It presents a novel steganographic encoding scheme for LLMs, evaluates the risk factors, and empirically demonstrates effective covert data transmission without compromising model utility.
Findings
Models transmit 32-bit secrets with 87% accuracy on single prompts.
Majority voting across generations increases accuracy to over 97%.
Models maintain coherence, utility, and evade human detection.
Abstract
As large language models (LLMs) become integrated into sensitive workflows, concerns grow over their potential to leak confidential information. We propose TrojanStego, a novel threat model in which an adversary fine-tunes an LLM to embed sensitive context information into natural-looking outputs via linguistic steganography, without requiring explicit control over inference inputs. We introduce a taxonomy outlining risk factors for compromised LLMs, and use it to evaluate the risk profile of the threat. To implement TrojanStego, we propose a practical encoding scheme based on vocabulary partitioning learnable by LLMs via fine-tuning. Experimental results show that compromised models reliably transmit 32-bit secrets with 87% accuracy on held-out prompts, reaching over 97% accuracy using majority voting across three generations. Further, they maintain high utility, can evade human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Advanced Steganography and Watermarking Techniques
