Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs

Yibo Wang; Hai-Long Sun; Qing-Guo Chen; Zhao Xu; Weihua Luo; Kaifu Zhang; Lijun Zhang

arXiv:2601.08198·cs.CL·January 14, 2026

Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs

Yibo Wang, Hai-Long Sun, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

PDF

Open Access

TL;DR

The paper introduces T-SPIN, a novel self-play fine-tuning method for large language models that stabilizes training and improves performance, especially with limited annotated data, by using triplet-based advantages and entropy constraints.

Contribution

T-SPIN enhances self-play fine-tuning by incorporating historical advantages and entropy constraints, addressing stability and alignment issues in previous methods.

Findings

01

T-SPIN outperforms SPIN in various tasks.

02

T-SPIN achieves comparable or better results than supervised fine-tuning with only 25% of labeled data.

03

T-SPIN demonstrates stable iterative evolution during training.

Abstract

Recently, self-play fine-tuning (SPIN) has been proposed to adapt large language models to downstream applications with scarce expert-annotated data, by iteratively generating synthetic responses from the model itself. However, SPIN is designed to optimize the current reward advantages of annotated responses over synthetic responses at hand, which may gradually vanish during iterations, leading to unstable optimization. Moreover, the utilization of reference policy induces a misalignment issue between the reward formulation for training and the metric for generation. To address these limitations, we propose a novel Triplet-based Self-Play fIne-tuNing (T-SPIN) method that integrates two key designs. First, beyond current advantages, T-SPIN additionally incorporates historical advantages between iteratively generated responses and proto-synthetic responses produced by the initial policy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification