You Can Learn Tokenization End-to-End with Reinforcement Learning
Sam Dauncey, Roger Wattenhofer

TL;DR
This paper introduces a reinforcement learning approach to learn token boundaries end-to-end within language models, outperforming previous methods by directly optimizing discrete boundary decisions with score function estimates.
Contribution
It demonstrates that token boundaries can be effectively learned using score function estimates and reinforcement learning, providing a theoretically grounded and practical alternative to prior heuristics and straight-through methods.
Findings
Reinforcement learning reduces variance in token boundary learning.
The method outperforms prior straight-through estimates.
Effective at 100 million parameter scale.
Abstract
Tokenization is a hardcoded compression step which remains in the training pipeline of Large Language Models (LLMs), despite a general trend towards architectures becoming increasingly end-to-end. Prior work has shown promising results at scale in bringing this compression step inside the LLMs' architecture with heuristics to draw token boundaries, and also attempts to learn these token boundaries with straight-through estimates, which treat the problem of drawing discrete token boundaries as a continuous one. We show that these token boundaries can instead be learned using score function estimates, which have tighter theoretical guarantees due to directly optimizing the problem of drawing discrete token boundaries to minimize loss. We observe that techniques from reinforcement learning, such as time discounting, are necessary to reduce the variance of this score function sufficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Natural Language Processing Techniques
