Federated Offline Policy Optimization with Dual Regularization
Sheng Yue, Zerui Qin, Xingyuan Hua, Yongheng Deng, Ju Ren

TL;DR
This paper introduces DRPO, an offline federated reinforcement learning algorithm that enables multiple agents to collaboratively learn decision policies solely from static data, avoiding costly environment interactions.
Contribution
It proposes a novel dual regularization approach for offline federated policy optimization, addressing distributional shifts and ensuring policy improvement without environment interaction.
Findings
DRPO outperforms baseline methods in experiments.
Theoretical analysis shows effective handling of distributional shifts.
Ensures policy improvement in each federated learning round.
Abstract
Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named , which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Advanced Data Storage Technologies · Stochastic Gradient Optimization Techniques
