ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography
Yaofei Wang, Rui Wang, Weilong Pang, JiaLiang Han, Yuan Qi, Donghui Hu, and Kejiang Chen

TL;DR
ReTokSync is a novel self-synchronizing framework for linguistic steganography that effectively manages tokenization ambiguity, ensuring high security, quality, and near-perfect message extraction in covert language communication.
Contribution
It introduces ReTokSync, a disambiguation method that maintains synchronization during generation, improving security and efficiency over existing solutions.
Findings
ReTokSync achieves over 99.7% extraction accuracy.
It maintains zero KL divergence, indicating distributional security.
The two-channel system ensures 100% end-to-end message recovery.
Abstract
Generative linguistic steganography (GLS) enables covert communication by embedding secret messages into the natural language generation process. In practical deployment, however, GLS is vulnerable to tokenization ambiguity: the same surface text may be re-tokenized into a different token sequence at the receiver, breaking the shared decoding state between the communicating parties so that a single local mismatch can propagate into complete extraction failure. Existing solutions either remove ambiguous tokens -- distorting the generation distribution and compromising security -- or preserve the distribution at the cost of substantially reduced embedding capacity or prohibitive runtime overhead. To address this issue, we propose ReTokSync (Re-Tokenization Synchronization), a self-synchronizing disambiguation framework that monitors the receiver-view tokenization during generation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
