TL;DR
SPEAR is an online federated learning algorithm that improves large language models through self-play and real-time feedback, without needing ground-truth contexts, and is resource-efficient.
Contribution
It introduces SPEAR, a novel online federated LLM fine-tuning method leveraging advantage-weighted refinement and self-play with real-time feedback.
Findings
SPEAR outperforms state-of-the-art baselines on various benchmarks.
It enables resource-efficient online training without ground-truth contexts.
SPEAR effectively incorporates external feedback in federated settings.
Abstract
Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an offline setup to allow for such feedback-based methods, and are further limited in the need of requiring privileged ground-truth contexts for training. Moreover, there is limited consideration of federated learning (FL), which is particularly well-suited for incorporating external feedback across large networks of end users, for example, but requires methods to be efficient for training on resource-constrained edge devices. Therefore, we introduce SPEAR (Self-Play Enhancement via Advantage-Weighted Refinement), an efficient online learning algorithm for federated LLM fine-tuning. SPEAR utilizes a feedback-guided self-play loop to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
