Stochasticity in Tokenisation Improves Robustness

Sophie Steger; Rui Li; Sofiane Ennadir; Anya Sims; Arno Solin; Franz Pernkopf; Martin Trapp

arXiv:2604.16037·cs.CL·April 20, 2026

Stochasticity in Tokenisation Improves Robustness

Sophie Steger, Rui Li, Sofiane Ennadir, Anya Sims, Arno Solin, Franz Pernkopf, Martin Trapp

PDF

TL;DR

Stochastic tokenisation enhances the robustness of large language models against adversarial and random input perturbations without increasing inference costs.

Contribution

This study systematically demonstrates that training with stochastic tokenisation improves LLM robustness across various regimes and architectures.

Findings

01

Stochastic tokenisation reduces vulnerability to adversarial attacks.

02

Pre-training with stochastic tokenisation improves model robustness.

03

Training with stochastic tokenisation maintains accuracy without extra inference cost.

Abstract

The widespread adoption of large language models (LLMs) has increased concerns about their robustness. Vulnerabilities in perturbations of tokenisation of the input indicate that models trained with a deterministic canonical tokenisation can be brittle to adversarial attacks. Recent studies suggest that stochastic tokenisation can deliver internal representations that are less sensitive to perturbations. In this paper, we analyse how stochastic tokenisations affect robustness to adversarial attacks and random perturbations. We systematically study this over a range of learning regimes (pre-training, supervised fine-tuning, and in-context learning), data sets, and model architectures. We show that pre-training and fine-tuning with uniformly sampled stochastic tokenisations improve robustness to random and adversarial perturbations. Evaluating on uniformly sampled non-canonical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.