Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning
Lingzhe Zhang, Tong Jia, Yunpeng Zhai, Liancheng Fang, Kening Zheng, Hongyi Liu, Xiaosong Huang, Philip S. Yu, Ying Li

TL;DR
This paper introduces RFT-FaultBench, a comprehensive benchmark for failures in reinforcement fine-tuning of large language models, and proposes RFT-FM, an automatic failure management framework that enhances robustness.
Contribution
It presents the first benchmark for detailed failure analysis in RFT and develops an automatic system for failure detection, diagnosis, and remediation.
Findings
RFT failures are observable from training dynamics.
RFT-FaultBench reveals complex fault structures.
RFT-FM effectively detects and mitigates failures.
Abstract
Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subproblems by modifying RFT algorithms. Despite their effectiveness, they largely overlook the problem of failure management at the training-process level. When training goes wrong, practitioners still rely heavily on expert-driven manual inspection and correction, and automatic failure management for RFT remains largely unexplored. In this paper, we take a first step toward systematic failure management for reinforcement fine-tuning. To understand the empirical structure of RFT failures, we first construct RFT-FaultBench, the first benchmark for fine-grained failures in reinforcement fine-tuning, covering 5 fault families, 16 fault…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
