FORLER: Federated Offline Reinforcement Learning with Q-Ensemble and Actor Rectification
Nan Qiao, Sheng Yue

TL;DR
FORLER introduces a novel federated offline reinforcement learning framework that combines Q-ensemble aggregation and actor rectification, improving robustness and performance in heterogeneous, low-quality data environments.
Contribution
It proposes a new method integrating server Q-ensemble aggregation with device actor rectification, addressing policy pollution and computational constraints in offline federated RL.
Findings
Outperforms strong baselines across various data qualities.
Provides theoretical guarantees for safe policy improvement.
Effectively mitigates policy pollution in heterogeneous data settings.
Abstract
In Internet-of-Things systems, federated learning has advanced online reinforcement learning (RL) by enabling parallel policy training without sharing raw data. However, interacting with real environments online can be risky and costly, motivating offline federated RL (FRL), where local devices learn from fixed datasets. Despite its promise, offline FRL may break down under low-quality, heterogeneous data. Offline RL tends to get stuck in local optima, and in FRL, one device's suboptimal policy can degrade the aggregated model, i.e., policy pollution. We present FORLER, combining Q-ensemble aggregation on the server with actor rectification on devices. The server robustly merges device Q-functions to curb policy pollution and shift heavy computation off resource-constrained hardware without compromising privacy. Locally, actor rectification enriches policy gradients via a zeroth-order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · IoT and Edge/Fog Computing · Age of Information Optimization
