Goal-Conditioned Supervised Learning for LLM Fine-Tuning

Shijun Li; Kaiwen Dong; Xiang Gao; Joydeep Ghosh

arXiv:2605.16345·cs.LG·May 19, 2026

Goal-Conditioned Supervised Learning for LLM Fine-Tuning

Shijun Li, Kaiwen Dong, Xiang Gao, Joydeep Ghosh

PDF

TL;DR

This paper introduces goal-conditioned supervised learning (GCSL), a novel offline fine-tuning method for LLMs that directly optimizes responses to achieve explicit goals using graded feedback.

Contribution

The paper proposes GCSL, a new offline fine-tuning framework that treats feedback as explicit goals and improves upon existing supervised methods by guiding models towards outcome thresholds.

Findings

01

GCSL outperforms standard offline fine-tuning baselines across tasks.

02

Using goal thresholds mitigates the bounded-learning effect of traditional SFT.

03

Natural-language goal representations enhance model understanding and reasoning.

Abstract

Large language models often require fine-tuning to better align their behavior with user intent at deployment. Existing approaches are commonly divided into online and offline paradigms. Online methods, such as RL-based alignment, can directly optimize outcome quality but typically rely on external reward models and iterative rollouts, making them costly and difficult to deploy in many cases. Offline methods are more efficient, but prevailing approaches such as supervised fine-tuning (SFT) and direct preference optimization (DPO) remain limited: SFT typically collapses graded feedback into binary supervision, while DPO depends on paired preference data that is often unavailable or expensive to construct. In this paper, we propose goal-conditioned supervised learning (GCSL) as an offline fine-tuning framework for LLMs. Our core idea is to treat feedback signals directly as an explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.