Learning to Insert [PAUSE] Tokens for Better Reasoning

Eunki Kim; Sangryul Kim; James Thorne

arXiv:2506.03616·cs.CL·August 12, 2025

Learning to Insert [PAUSE] Tokens for Better Reasoning

Eunki Kim, Sangryul Kim, James Thorne

PDF

1 Video

TL;DR

This paper introduces Dynamic Inserting Tokens Training (DIT), a novel method that strategically inserts [PAUSE] tokens at low-confidence positions in sequences to improve reasoning in large language models, achieving significant accuracy gains.

Contribution

The paper proposes a dynamic, confidence-based token insertion method that outperforms traditional fine-tuning and previous approaches in reasoning tasks.

Findings

01

Up to 4.7% accuracy improvement on GSM8K

02

Up to 3.23% accuracy improvement on AQUA-RAT

03

Up to 3.4% pass@1 improvement on MBPP

Abstract

To enhance reasoning capabilities, previous works have explored incorporating special-purpose tokens into the training process. These strategies strengthen the learning mechanism of transformer-based large language models (LLMs). Building on prior research, in which inserting dummy tokens consecutively just before reasoning steps can enhance effectiveness, we introduce a novel approach termed Dynamic Inserting Tokens Training (DIT). Our method identifies positions within sequences where model confidence is lowest according to token log-likelihood. Strategically inserting [PAUSE] tokens on these positions bolsters the model's predictive capabilities for subsequent tokens. Experimental results across diverse datasets and models, from the 2.7B model to the 8B model, demonstrate that DIT consistently outperforms traditional fine-tuning and previous token insertion methods. With this simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning to Insert [PAUSE] Tokens for Better Reasoning· underline