Post-Completion Learning for Language Models

Xiang Fei; Siqi Wang; Shu Wei; Yuxiang Nie; Wei Shi; Hao Feng; Chao Feng; Can Huang

arXiv:2507.20252·cs.CL·August 13, 2025

Post-Completion Learning for Language Models

Xiang Fei, Siqi Wang, Shu Wei, Yuxiang Nie, Wei Shi, Hao Feng, Chao Feng, Can Huang

PDF

TL;DR

This paper introduces Post-Completion Learning (PCL), a training framework that leverages the sequence space after model output completion to improve reasoning and self-evaluation abilities without sacrificing inference efficiency.

Contribution

The paper proposes a novel post-completion learning framework and a white-box reinforcement learning method to enhance language models' reasoning and evaluation capabilities.

Findings

01

Consistent improvements over traditional SFT and RL methods.

02

Enhanced reasoning and self-evaluation abilities in models.

03

Maintained inference efficiency during training.

Abstract

Current language model training paradigms typically terminate learning upon reaching the end-of-sequence (<eos>) token, overlooking the potential learning opportunities in the post-completion space. We propose Post-Completion Learning (PCL), a novel training framework that systematically utilizes the sequence space after model output completion, to enhance both the reasoning and self-evaluation abilities. PCL enables models to continue generating self-assessments and reward predictions during training, while maintaining efficient inference by stopping at the completion point. To fully utilize this post-completion space, we design a white-box reinforcement learning method: let the model evaluate the output content according to the reward rules, then calculate and align the score with the reward functions for supervision. We implement dual-track SFT to optimize both reasoning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.