Large Language Models as Carriers of Hidden Messages

Jakub Hoscilowicz; Pawel Popiolek; Jan Rudkowski; Jedrzej Bieniasz; Artur Janicki

arXiv:2406.02481·cs.CL·June 27, 2025

Large Language Models as Carriers of Hidden Messages

Jakub Hoscilowicz, Pawel Popiolek, Jan Rudkowski, Jedrzej Bieniasz, Artur Janicki

PDF

Open Access 1 Repo

TL;DR

This paper explores how fine-tuned large language models can embed hidden messages, demonstrates vulnerabilities in extracting these messages, and proposes a defense method that enhances security without harming model performance.

Contribution

It introduces the Unconditional Token Forcing attack and the UTFC defense, advancing understanding of hidden message security in LLMs and proposing practical countermeasures.

Findings

01

UTF effectively extracts hidden messages from fine-tuned LLMs.

02

UTFC prevents extraction attacks while maintaining LLM performance.

03

Embedding hidden messages can be exploited for covert communication.

Abstract

Simple fine-tuning can embed hidden text into large language models (LLMs), which is revealed only when triggered by a specific query. Applications include LLM fingerprinting, where a unique identifier is embedded to verify licensing compliance, and steganography, where the LLM carries hidden messages disclosed through a trigger query. Our work demonstrates that embedding hidden text via fine-tuning, although seemingly secure due to the vast number of potential triggers, is vulnerable to extraction through analysis of the LLM's output decoding process. We introduce an extraction attack called Unconditional Token Forcing (UTF), which iteratively feeds tokens from the LLM's vocabulary to reveal sequences with high token probabilities, indicating hidden text candidates. We also present Unconditional Token Forcing Confusion (UTFC), a defense paradigm that makes hidden text resistant to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

j-hoscilowic/zurek-stegano
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Topic Modeling