Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail

Yingru Li; Jiawei Xu; Jiacai Liu; Yuxuan Tong; Ziniu Li; Tianle Cai; Ge Zhang; Qian Liu; Baoxiang Wang

arXiv:2512.23087·cs.LG·February 9, 2026

Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail

Yingru Li, Jiawei Xu, Jiacai Liu, Yuxuan Tong, Ziniu Li, Tianle Cai, Ge Zhang, Qian Liu, Baoxiang Wang

PDF

Open Access

TL;DR

This paper introduces Dynamic Vocabulary Pruning (DVP), a method that stabilizes reinforcement learning for large language models by excluding tail tokens to reduce divergence and bias, ensuring more reliable training.

Contribution

The paper proposes DVP, a novel approach that dynamically prunes vocabulary tail tokens to improve RL stability in LLMs, with theoretical bias bounds and empirical validation.

Findings

01

DVP stabilizes RL training for LLMs.

02

Theoretical bounds on bias introduced by pruning.

03

Empirical results show improved training stability.

Abstract

Reinforcement Learning (RL) for Large Language Models (LLMs) faces a fundamental tension: the numerical divergence between high-throughput inference engines and numerically precise training engines. Although these systems share the same parameters, they produce slightly different probability distributions, creating a training-inference mismatch. We prove that the bound on the log-probability divergence arising from this mismatch scales as $(1 - p)$ , where $p$ is the token probability. This scaling induces a highly asymmetric effect: the bound vanishes for high-probability tokens but remains significant for low-probability tokens in the distribution tail. When sampled, these tail tokens introduce systematically biased errors that accumulate over sequences, thereby destabilizing gradient estimation. Instead of applying post-hoc corrections, we propose Dynamic Vocabulary Pruning (DVP), which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms