Federated Offline Policy Optimization with Dual Regularization

Sheng Yue; Zerui Qin; Xingyuan Hua; Yongheng Deng; Ju Ren

arXiv:2405.17474·cs.LG·May 30, 2024

Federated Offline Policy Optimization with Dual Regularization

Sheng Yue, Zerui Qin, Xingyuan Hua, Yongheng Deng, Ju Ren

PDF

Open Access

TL;DR

This paper introduces DRPO, an offline federated reinforcement learning algorithm that enables multiple agents to collaboratively learn decision policies solely from static data, avoiding costly environment interactions.

Contribution

It proposes a novel dual regularization approach for offline federated policy optimization, addressing distributional shifts and ensuring policy improvement without environment interaction.

Findings

01

DRPO outperforms baseline methods in experiments.

02

Theoretical analysis shows effective handling of distributional shifts.

03

Ensures policy improvement in each federated learning round.

Abstract

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named $DRPO$ , which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. $DRPO$ leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Advanced Data Storage Technologies · Stochastic Gradient Optimization Techniques