PubSwap: Public-Data Off-Policy Coordination for Federated RLVR

Anupam Nayak; Baris Askin; Muhammed Ustaomeroglu; Carlee Joe-Wong; Gauri Joshi

arXiv:2604.12160·cs.LG·April 15, 2026

PubSwap: Public-Data Off-Policy Coordination for Federated RLVR

Anupam Nayak, Baris Askin, Muhammed Ustaomeroglu, Carlee Joe-Wong, Gauri Joshi

PDF

TL;DR

This paper introduces PubSwap, a federated RLVR framework that uses public data and low-rank adaptation to improve communication efficiency and coordination across decentralized organizations.

Contribution

It proposes a novel combination of LoRA-based local adaptation and public-data off-policy steps for scalable, privacy-preserving federated reinforcement learning from verifiable rewards.

Findings

01

Consistently improves performance on mathematical and medical reasoning benchmarks.

02

Enhances communication efficiency and cross-client coordination in federated RLVR.

03

Demonstrates effectiveness of combining low-rank adaptation with public-data exchange.

Abstract

Reasoning post-training with reinforcement learning from verifiable rewards (RLVR) is typically studied in centralized settings, yet many realistic applications involve decentralized private data distributed across organizations. Federated training is a natural solution, but scaling RLVR in this regime is challenging: full-model synchronization is expensive, and performing many local steps can cause severe client drift under heterogeneous data. We propose a federated RLVR framework that combines LoRA-based local adaptation with public-data-based off-policy steps to improve both communication efficiency and cross-client coordination. In particular, a small shared public dataset is used to periodically exchange and reuse response-level training signals across organizations, providing a lightweight anchor toward a more globally aligned objective without exposing private data. Our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.