Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning

Shangzhe Li; Xuchao Zhang; Chetan Bansal; Weitong Zhang

arXiv:2602.01357·cs.LG·February 3, 2026

Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning

Shangzhe Li, Xuchao Zhang, Chetan Bansal, Weitong Zhang

PDF

Open Access

TL;DR

This paper reveals that self-play finetuning of large language models can be understood as adversarial imitation learning, providing a theoretical framework and a new stable algorithm that improves performance across tasks.

Contribution

It introduces a novel game-theoretic perspective linking self-play to adversarial imitation learning and proposes a new finetuning algorithm based on $\\chi^2$-divergence for better stability.

Findings

01

Convergence of self-play finetuning to equilibrium shown theoretically.

02

The proposed algorithm outperforms existing self-play methods.

03

Experimental validation across multiple language tasks.

Abstract

Self-play post-training methods has emerged as an effective approach for finetuning large language models and turn the weak language model into strong language model without preference data. However, the theoretical foundations for self-play finetuning remain underexplored. In this work, we tackle this by connecting self-play finetuning with adversarial imitation learning by formulating finetuning procedure as a min-max game between the model and a regularized implicit reward player parameterized by the model itself. This perspective unifies self-play imitation and general preference alignment within a common framework. Under this formulation, we present a game-theoretic analysis showing that the self-play finetuning will converge to it's equilibrium. Guided by this theoretical formulation, we propose a new self-play imitation finetuning algorithm based on the $χ^{2}$ -divergence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)