Learning a Continue-Thinking Token for Enhanced Test-Time Scaling

Liran Ringel; Elad Tolochinsky; Yaniv Romano

arXiv:2506.11274·cs.CL·May 14, 2026

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling

Liran Ringel, Elad Tolochinsky, Yaniv Romano

PDF

1 Repo

TL;DR

This paper introduces a learned continue-thinking token for language models, which improves reasoning accuracy at test time by extending reasoning steps more effectively than fixed tokens.

Contribution

It proposes a novel approach to learn a dedicated continue-thinking token via reinforcement learning, enhancing test-time reasoning beyond fixed token methods.

Findings

01

Learned token improves accuracy on math benchmarks.

02

Achieves greater gains than fixed-token approaches.

03

Significantly boosts GSM8K benchmark performance.

Abstract

Test-time scaling has emerged as an effective approach for improving language model performance by utilizing additional compute at inference time. Recent studies have shown that overriding end-of-thinking tokens (e.g., replacing "</think>" with "Wait") can extend reasoning steps and improve accuracy. In this work, we explore whether a dedicated continue-thinking token can be learned to trigger extended reasoning. We augment a distilled version of DeepSeek-R1 with a single learned "<|continue-thinking|>" token, training only its embedding via reinforcement learning while keeping the model weights frozen. Our experiments show that this learned token achieves improved accuracy on standard math benchmarks compared to both the baseline model and a test-time scaling approach that uses a fixed token (e.g., "Wait") for budget forcing. In particular, we observe that in cases where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liranringel/learning-continue-thinking-token
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.