Are you going to finish that? A Practical Study of the Partial Token Problem
Hao Xu, Alisa Liu, Jonathan Hayase, Yejin Choi, Noah A. Smith

TL;DR
This paper investigates the partial token problem in language models, revealing its prevalence in realistic prompts across languages and code, and evaluates mitigation strategies to address the resulting probability distortions.
Contribution
It systematically studies the partial token problem in natural and code prompts, quantifies its severity, and evaluates practical mitigation techniques for inference-time correction.
Findings
Partial tokens cause significant probability distortion in language models.
The problem persists and worsens with larger models.
Recent exact mitigation solutions are effective in reducing probability errors.
Abstract
Language models (LMs) are trained over sequences of tokens, whereas users interact with LMs via text. This mismatch gives rise to the partial token problem, which occurs when a user ends their prompt in the middle of the expected next-token, leading to distorted next-token predictions. Although this issue has been studied using arbitrary character prefixes, its prevalence and severity in realistic prompts respecting word boundaries remains underexplored. In this work, we identify three domains where token and "word" boundaries often do not line up: languages that do not use whitespace, highly compounding languages, and code. In Chinese, for example, up to 25% of word boundaries do not line up with token boundaries, making even natural, word-complete prompts susceptible to this problem. We systematically construct semantically natural prompts ending with a partial tokens; in experiments,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
