Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models

Manh Nguyen; Dung Nguyen; Dai Do; Svetha Venkatesh; Hung Le

arXiv:2511.09864·cs.LG·November 14, 2025

Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models

Manh Nguyen, Dung Nguyen, Dai Do, Svetha Venkatesh, Hung Le

PDF

Open Access

TL;DR

This paper presents an uncertainty-guided checkpoint selection method for reinforcement learning fine-tuning of large language models, improving stability and generalization by focusing on challenging samples.

Contribution

It introduces a novel approach that ranks checkpoints based on handling hard question-answer pairs, reducing computational costs and enhancing model reliability.

Findings

01

Outperforms traditional checkpoint selection methods.

02

Consistently identifies checkpoints with better generalization.

03

Models solving hard tasks with low uncertainty are more reliable.

Abstract

Reinforcement learning (RL) finetuning is crucial to aligning large language models (LLMs), but the process is notoriously unstable and exhibits high variance across model checkpoints. In practice, selecting the best checkpoint is challenging: evaluating checkpoints on the validation set during training is computationally expensive and requires a good validation set, while relying on the final checkpoint provides no guarantee of good performance. We introduce an uncertainty-guided approach for checkpoint selection (UGCS) that avoids these pitfalls. Our method identifies hard question-answer pairs using per-sample uncertainty and ranks checkpoints by how well they handle these challenging cases. By averaging the rewards of the top-uncertain samples over a short training window, our method produces a stable and discriminative signal without additional forward passes or significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications