Post-Completion Learning for Language Models
Xiang Fei, Siqi Wang, Shu Wei, Yuxiang Nie, Wei Shi, Hao Feng, Chao Feng, Can Huang

TL;DR
This paper introduces Post-Completion Learning (PCL), a training framework that leverages the sequence space after model output completion to improve reasoning and self-evaluation abilities without sacrificing inference efficiency.
Contribution
The paper proposes a novel post-completion learning framework and a white-box reinforcement learning method to enhance language models' reasoning and evaluation capabilities.
Findings
Consistent improvements over traditional SFT and RL methods.
Enhanced reasoning and self-evaluation abilities in models.
Maintained inference efficiency during training.
Abstract
Current language model training paradigms typically terminate learning upon reaching the end-of-sequence (<eos>) token, overlooking the potential learning opportunities in the post-completion space. We propose Post-Completion Learning (PCL), a novel training framework that systematically utilizes the sequence space after model output completion, to enhance both the reasoning and self-evaluation abilities. PCL enables models to continue generating self-assessments and reward predictions during training, while maintaining efficient inference by stopping at the completion point. To fully utilize this post-completion space, we design a white-box reinforcement learning method: let the model evaluate the output content according to the reward rules, then calculate and align the score with the reward functions for supervision. We implement dual-track SFT to optimize both reasoning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
