FedMOA: Federated GRPO for Personalized Reasoning LLMs under Heterogeneous Rewards
Ziyao Wang, Daeun Jung, Yexiao He, Guoheng Sun, Zheyu Shen, Myungjin Lee, Ang Li

TL;DR
FedMOA introduces a federated reinforcement learning framework that enhances personalized reasoning in large language models by effectively handling heterogeneous rewards and multi-objective optimization, leading to improved accuracy and personalization.
Contribution
The paper presents FedMOA, a novel federated GRPO method with adaptive weighting and task-aware aggregation for multi-objective alignment under heterogeneous rewards.
Findings
Achieves up to 2.2% accuracy improvement over federated averaging.
Enhances global performance and personalization in reasoning tasks.
Effectively manages heterogeneous rewards and multi-objective optimization.
Abstract
Group Relative Policy Optimization (GRPO) has recently emerged as an effective approach for improving the reasoning capabilities of large language models through online multi-objective reinforcement learning. While personalization on private data is increasingly vital, traditional Reinforcement Learning (RL) alignment is often memory-prohibitive for on-device federated learning due to the overhead of maintaining a separate critic network. GRPO's critic-free architecture enables feasible on-device training, yet transitioning to a federated setting introduces systemic challenges: heterogeneous reward definitions, imbalanced multi-objective optimization, and high training costs. We propose FedMOA, a federated GRPO framework for multi-objective alignment under heterogeneous rewards. FedMOA stabilizes local training through an online adaptive weighting mechanism via hypergradient descent,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Topic Modeling · Domain Adaptation and Few-Shot Learning
