SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models

Yibo Wang; Qing-Guo Chen; Zhao Xu; Weihua Luo; Kaifu Zhang; Lijun Zhang

arXiv:2512.07175·cs.LG·December 9, 2025

SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models

Yibo Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

PDF

Open Access

TL;DR

SPACE introduces a noise contrastive estimation approach to stabilize self-play fine-tuning of large language models, effectively aligning model outputs with real data distribution and improving performance and stability.

Contribution

The paper proposes SPACE, a novel self-play fine-tuning method using noise contrastive estimation to ensure stable convergence and better data distribution alignment.

Findings

01

SPACE outperforms supervised fine-tuning with fewer real samples.

02

SPACE achieves more stable evolution compared to gap-based methods.

03

Empirical results show significant performance improvements across tasks.

Abstract

Self-play fine-tuning has demonstrated promising abilities in adapting large language models (LLMs) to downstream tasks with limited real-world data. The basic principle is to iteratively refine the model with real samples and synthetic ones generated from itself. However, the existing methods primarily focus on the relative gaps between the rewards for two types of data, neglecting their absolute values. Through theoretical analysis, we identify that the gap-based methods suffer from unstable evolution, due to the potentially degenerated objectives. To address this limitation, we introduce a novel self-play fine-tuning method, namely Self-PlAy via Noise Contrastive Estimation (SPACE), which leverages noise contrastive estimation to capture the real-world data distribution. Specifically, SPACE treats synthetic samples as auxiliary components, and discriminates them from the real ones in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education