IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning

Wenjie Liao; Like Wu; Liangjie Zhao; Shihui Xu; Shigeru Fujimura

arXiv:2604.20933·cs.LG·April 24, 2026

IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning

Wenjie Liao, Like Wu, Liangjie Zhao, Shihui Xu, Shigeru Fujimura

PDF

TL;DR

IRIS introduces a flexible Renyi-based self-play fine-tuning framework for large language models, adaptively shifting objectives during training to improve performance with fewer annotations.

Contribution

It unifies various self-play objectives under a Renyi divergence framework and proposes an adaptive schedule for improved training dynamics.

Findings

01

IRIS outperforms baselines on ten benchmarks.

02

With only 26k annotations, IRIS surpasses full-data supervised fine-tuning.

03

IRIS achieves 44.57% average score across tasks.

Abstract

Self-play fine-tuning enables large language models to improve beyond supervised fine-tuning without additional human annotations by contrasting annotated responses with self-generated ones. Many existing methods rely on a fixed divergence regime. SPIN is closely related to a KL-based regime, SPACE to a Jensen-Shannon-style objective via noise contrastive estimation, and SPIF to $χ^{2}$ -regularized self-play. Since these divergences exhibit different strengths depending on the distributional gap between model and target, no single choice appears to provide favorable learning dynamics across training stages. We propose IRIS (Interpolative R\'enyi Iterative Self-play), a R\'enyi-based self-play fine-tuning framework with a continuously adjustable objective. IRIS decomposes into two independent tilted risk terms over annotated and synthetic data, with exponential importance weights…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.