FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Client
Gongxi Zhu, Hanlin Gu, Lixin Fan, Qiang Yang, Yuxing Han

TL;DR
FedGRPO introduces a privacy-preserving federated learning framework that uses group-relative rewards and expert selection to improve foundation model performance efficiently across diverse domains.
Contribution
It reformulates federated foundation model training as a reinforcement learning process, reducing privacy risks and communication costs while enhancing accuracy.
Findings
Outperforms baseline methods in accuracy across multiple domain tasks.
Reduces communication overhead compared to traditional FedFMs.
Maintains privacy by exchanging reward signals instead of data or model updates.
Abstract
One important direction of Federated Foundation Models (FedFMs) is leveraging data from small client models to enhance the performance of a large server-side foundation model. Existing methods based on model level or representation level knowledge transfer either require expensive local training or incur high communication costs and introduce unavoidable privacy risks. We reformulate this problem as a reinforcement learning style evaluation process and propose FedGRPO, a privacy preserving framework comprising two modules. The first module performs competence-based expert selection by building a lightweight confidence graph from auxiliary data to identify the most suitable clients for each question. The second module leverages the "Group Relative" concept from the Group Relative Policy Optimization (GRPO) framework by packaging each question together with its solution rationale into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Recommender Systems and Techniques · Domain Adaptation and Few-Shot Learning
