# Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models

**Authors:** Ruiyi Yan, Yugo Murawaki

arXiv: 2508.20718 · 2025-08-29

## TL;DR

This paper investigates tokenization inconsistency in LLM-based steganography and watermarking, proposing solutions that improve robustness, imperceptibility, and attack resistance in text security applications.

## Contribution

It identifies key causes of tokenization inconsistency and introduces tailored methods to eliminate TI, enhancing the effectiveness of steganography and watermarking.

## Key findings

- Addressing TI improves fluency and imperceptibility in steganography.
- TI mitigation enhances watermark detectability and robustness.
- Proposed solutions outperform traditional disambiguation methods.

## Abstract

Large language models have significantly enhanced the capacities and efficiency of text generation. On the one hand, they have improved the quality of text-based steganography. On the other hand, they have also underscored the importance of watermarking as a safeguard against malicious misuse. In this study, we focus on tokenization inconsistency (TI) between Alice and Bob in steganography and watermarking, where TI can undermine robustness. Our investigation reveals that the problematic tokens responsible for TI exhibit two key characteristics: infrequency and temporariness. Based on these findings, we propose two tailored solutions for TI elimination: a stepwise verification method for steganography and a post-hoc rollback method for watermarking. Experiments show that (1) compared to traditional disambiguation methods in steganography, directly addressing TI leads to improvements in fluency, imperceptibility, and anti-steganalysis capacity; (2) for watermarking, addressing TI enhances detectability and robustness against attacks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20718/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20718/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/2508.20718/full.md

---
Source: https://tomesphere.com/paper/2508.20718