Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs

Wu Li; Yigeng Zhou; Zesheng Shi; Yequan Wang; Min Zhang; Jing Li

arXiv:2605.09922·cs.CL·May 12, 2026

Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs

Wu Li, Yigeng Zhou, Zesheng Shi, Yequan Wang, Min Zhang, Jing Li

PDF

1 Repo

TL;DR

The paper introduces TPAW, a novel self-play algorithm with adaptive weighting mechanisms that enhances the alignment of large language models through fully self-supervised training, outperforming existing methods.

Contribution

It proposes a team-based self-play framework with dual adaptive weighting to improve LLM alignment without human supervision.

Findings

01

TPAW outperforms existing baselines across various models and benchmarks.

02

Adaptive weighting mechanisms improve training stability and response quality.

03

The method reduces reliance on human-labeled data for alignment.

Abstract

While recent self-training approaches have reduced reliance on human-labeled data for aligning LLMs, they still face critical limitations: (i) sensitivity to synthetic data quality, leading to instability and bias amplification in iterative training; (ii) ineffective optimization due to a diminishing gap between positive and negative responses over successive training iterations. In this paper, we propose Team-based self-Play with dual Adaptive Weighting (TPAW), a novel self-play algorithm designed to improve alignment in a fully self-supervised setting. TPAW adopts a team-based framework in which the current policy model both collaborates with and competes against historical checkpoints, promoting more stable and efficient optimization. To further enhance learning, we design two adaptive weighting mechanisms: (i) a response reweighting scheme that adjusts the importance of target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lab-klc/TPAW
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.