Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

Seohyun Lee; Wenzhi Fang; Dong-Jun Han; Seyyedali Hosseinalipour; Christopher G. Brinton

arXiv:2605.07977·cs.LG·May 11, 2026

Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

Seohyun Lee, Wenzhi Fang, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton

PDF

1 Repo

TL;DR

SPEAR is an online federated learning algorithm that improves large language models through self-play and real-time feedback, without needing ground-truth contexts, and is resource-efficient.

Contribution

It introduces SPEAR, a novel online federated LLM fine-tuning method leveraging advantage-weighted refinement and self-play with real-time feedback.

Findings

01

SPEAR outperforms state-of-the-art baselines on various benchmarks.

02

It enables resource-efficient online training without ground-truth contexts.

03

SPEAR effectively incorporates external feedback in federated settings.

Abstract

Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an offline setup to allow for such feedback-based methods, and are further limited in the need of requiring privileged ground-truth contexts for training. Moreover, there is limited consideration of federated learning (FL), which is particularly well-suited for incorporating external feedback across large networks of end users, for example, but requires methods to be efficient for training on resource-constrained edge devices. Therefore, we introduce SPEAR (Self-Play Enhancement via Advantage-Weighted Refinement), an efficient online learning algorithm for federated LLM fine-tuning. SPEAR utilizes a feedback-guided self-play loop to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lee3296/SPEAR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.