SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization

Jianghao Wu; Yasmeen George; Jin Ye; Yicheng Wu; Daniel F. Schmidt; Jianfei Cai

arXiv:2511.17938·cs.CL·March 9, 2026

SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization

Jianghao Wu, Yasmeen George, Jin Ye, Yicheng Wu, Daniel F. Schmidt, Jianfei Cai

PDF

Open Access

TL;DR

SPINE introduces a token-selective reinforcement learning method that enhances test-time reasoning in large language models by focusing updates on critical decision points and using entropy regularization, leading to more stable and effective adaptation.

Contribution

It proposes a novel token-selective test-time reinforcement learning framework that improves reasoning stability and performance without requiring labels or reward models.

Findings

01

Consistently improves Pass@1 across eight benchmarks.

02

Avoids response-length collapse during training.

03

Enhances stability of test-time reasoning in LLMs and MLLMs.

Abstract

Large language models (LLMs) and multimodal LLMs (MLL-Ms) excel at chain-of-thought reasoning but face distribution shift at test-time and a lack of verifiable supervision. Recent test-time reinforcement learning (TTRL) methods derive label-free pseudo-rewards from self-consistency voting over sampled trajectories, yet they often collapse: the majority-vote reward prevails, responses shorten, and Pass@1 declines. We trace this to uniform sequence updates in which most tokens are low-entropy followers, while a small high-entropy subset determines the reasoning branches. Thus we propose \method, a token-selective test-time reinforcement learning framework that (i) performs distribution-aware forking-token selection to update only decision-critical branch points, and (ii) applies a robust entropy-band regularizer at those tokens to prevent premature collapse and suppress noisy drift.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)