Training-Trajectory-Aware Token Selection
Zhanming Shen, Jiaqi Hu, Zeyu Qin, Hao Chen, Wentao Ye, Zenan Huang, Yihong Zhuang, Guoshan Lu, Junlin Zhou, Junbo Zhao

TL;DR
This paper introduces T3S, a token selection method that improves continual distillation by addressing token-level confidence bifurcation, leading to significant performance gains in reasoning benchmarks.
Contribution
The paper proposes a novel training-trajectory-aware token selection technique that reconstructs the training objective at the token level to enhance distillation effectiveness.
Findings
T3S improves AR and dLLM performance across multiple models.
Qwen3-8B surpasses DeepSeek-R1 on reasoning benchmarks with few examples.
T3-trained LLaDA-2.0-Mini exceeds its AR baseline, achieving state-of-the-art results.
Abstract
Efficient distillation is a key pathway for converting expensive reasoning capability into deployable efficiency, yet in the frontier regime where the student already has strong reasoning ability, naive continual distillation often yields limited gains or even degradation. We observe a characteristic training phenomenon: even as loss decreases monotonically, all performance metrics can drop sharply at almost the same bottleneck, before gradually recovering. We further uncover a token-level mechanism: confidence bifurcates into steadily increasing Imitation-Anchor Tokens that quickly anchor optimization and other yet-to-learn tokens whose confidence is suppressed until after the bottleneck. And the characteristic that these two types of tokens cannot coexist is the root cause of the failure in continual distillation. To this end, we propose Training-Trajectory-Aware Token Selection (T3S)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
