Goal-Conditioned Supervised Learning for LLM Fine-Tuning
Shijun Li, Kaiwen Dong, Xiang Gao, Joydeep Ghosh

TL;DR
This paper introduces goal-conditioned supervised learning (GCSL), a novel offline fine-tuning method for LLMs that directly optimizes responses to achieve explicit goals using graded feedback.
Contribution
The paper proposes GCSL, a new offline fine-tuning framework that treats feedback as explicit goals and improves upon existing supervised methods by guiding models towards outcome thresholds.
Findings
GCSL outperforms standard offline fine-tuning baselines across tasks.
Using goal thresholds mitigates the bounded-learning effect of traditional SFT.
Natural-language goal representations enhance model understanding and reasoning.
Abstract
Large language models often require fine-tuning to better align their behavior with user intent at deployment. Existing approaches are commonly divided into online and offline paradigms. Online methods, such as RL-based alignment, can directly optimize outcome quality but typically rely on external reward models and iterative rollouts, making them costly and difficult to deploy in many cases. Offline methods are more efficient, but prevailing approaches such as supervised fine-tuning (SFT) and direct preference optimization (DPO) remain limited: SFT typically collapses graded feedback into binary supervision, while DPO depends on paired preference data that is often unavailable or expensive to construct. In this paper, we propose goal-conditioned supervised learning (GCSL) as an offline fine-tuning framework for LLMs. Our core idea is to treat feedback signals directly as an explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
