Cautious Next Token Prediction

Yizhou Wang; Lingzhi Zhang; Yue Bai; Mang Tik Chiu; Zhengmian Hu; Mingyuan Zhang; Qihua Dong; Yu Yin; Sohrab Amirghodsi; Yun Fu

arXiv:2507.03038·cs.CL·July 24, 2025

Cautious Next Token Prediction

Yizhou Wang, Lingzhi Zhang, Yue Bai, Mang Tik Chiu, Zhengmian Hu, Mingyuan Zhang, Qihua Dong, Yu Yin, Sohrab Amirghodsi, Yun Fu

PDF

Open Access 1 Repo

TL;DR

The paper introduces Cautious Next Token Prediction (CNTP), a training-free decoding method that improves language model outputs by sampling multiple paths when uncertainty is high, outperforming standard strategies.

Contribution

CNTP is a novel decoding strategy that adaptively samples multiple paths based on model confidence, enhancing performance without additional training.

Findings

01

CNTP outperforms existing decoding methods across various NLP tasks.

02

Integrating CNTP with self consistency yields further improvements.

03

CNTP aligns with human-like cautious exploration during uncertain predictions.

Abstract

Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model's capacity. The trial number is negatively correlated with the prediction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wyzjack/CNTP
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Software System Performance and Reliability · Natural Language Processing Techniques