Sampling from Your Language Model One Byte at a Time

Jonathan Hayase; Alisa Liu; Noah A. Smith; Sewoong Oh

arXiv:2506.14123·cs.CL·May 8, 2026

Sampling from Your Language Model One Byte at a Time

Jonathan Hayase, Alisa Liu, Noah A. Smith, Sewoong Oh

PDF

1 Repo

TL;DR

This paper introduces a method to convert autoregressive language models with BPE tokenizers into character-level or byte-level models, addressing the Prompt Boundary Problem and enabling ensemble and transfer learning.

Contribution

The authors propose an inference-time technique to unify vocabularies and mitigate tokenization issues in language models, facilitating ensemble and transfer learning.

Findings

01

Effectively solves the Prompt Boundary Problem at inference time.

02

Enables ensemble of models with different tokenizers.

03

Allows transfer learning between models with different tokenization schemes.

Abstract

Tokenization is used almost universally by modern language models, enabling efficient text representation using multi-byte or multi-character tokens. However, prior work has shown that tokenization can introduce distortion into the model's generations, an issue known as the Prompt Boundary Problem (PBP). For example, users are often advised not to end their prompts with a space because it prevents the model from including the space as part of the next token. While this heuristic is effective in English, the underlying PBP continues to affect code generation and languages such as Chinese, where tokens often do not line up with word and syntactic boundaries. In this work, we present an inference-time method to convert any autoregressive LM with a BPE tokenizer into a character-level or byte-level LM. Our method efficiently solves the PBP and is also able to unify the vocabularies of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SewoongLab/byte-sampler
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.