Reasoning Bias of Next Token Prediction Training

Pengxiao Lin; Zhongwang Zhang; Zhi-Qin John Xu

arXiv:2502.02007·cs.CL·February 21, 2025

Reasoning Bias of Next Token Prediction Training

Pengxiao Lin, Zhongwang Zhang, Zhi-Qin John Xu

PDF

Open Access

TL;DR

This paper investigates the reasoning capabilities of Large Language Models trained with next token prediction versus critical token prediction, revealing that NTP surprisingly outperforms CTP due to noise-induced regularization effects.

Contribution

The study demonstrates that next token prediction training leads to better reasoning abilities in LLMs than critical token prediction, challenging initial assumptions.

Findings

01

NTP-trained models show superior reasoning performance.

02

Noise during NTP training acts as a regularizer.

03

NTP models exhibit greater robustness and flatter minima.

Abstract

Since the inception of Large Language Models (LLMs), the quest to efficiently train them for superior reasoning capabilities has been a pivotal challenge. The dominant training paradigm for LLMs is based on next token prediction (NTP). Alternative methodologies, called Critical Token Prediction (CTP), focused exclusively on specific critical tokens (such as the answer in Q\&A dataset), aiming to reduce the overfitting of extraneous information and noise. Contrary to initial assumptions, our research reveals that despite NTP's exposure to noise during training, it surpasses CTP in reasoning ability. We attribute this counterintuitive outcome to the regularizing influence of noise on the training dynamics. Our empirical analysis shows that NTP-trained models exhibit enhanced generalization and robustness across various benchmark reasoning datasets, demonstrating greater resilience to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling