CORE: Compensable Reward as a Catalyst for Improving Offline RL in Wireless Networks
Lipeng Zu, Hansong Zhou, Yu Qian, Shayok Chakraborty, Yukun Yuan, Linke Guo, Xiaonan Zhang

TL;DR
CORE introduces a novel offline reinforcement learning framework tailored for wireless networks, utilizing behavior embedding and compensable rewards to improve policy learning from limited and noisy data.
Contribution
This work pioneers the application of offline RL in wireless networking, proposing a behavior clustering and reward construction method to handle noisy, limited data.
Findings
Effective identification of expert trajectories from noisy data
Improved policy performance using compensable rewards
First systematic exploration of offline RL in wireless networks
Abstract
Real-world wireless data are expensive to collect and often lack sufficient expert demonstrations, causing existing offline RL methods to overfit suboptimal behaviors and exhibit unstable performance. To address this issue, we propose CORE, an offline RL framework specifically designed for wireless environments. CORE identifies latent expert trajectories from noisy datasets via behavior embedding clustering, and trains a conditional variational autoencoder with a contrastive objective to separate expert and non-expert behaviors in latent space. Based on the learned representations, CORE constructs compensable rewards that reflect expert-likelihood, effectively guiding policy learning under limited or imperfect supervision. More broadly, this work represents one of the early systematic explorations of offline RL in wireless networking, where prior adoption remains limited. Beyond…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Networks and Protocols · Advanced MIMO Systems Optimization · Mobile Ad Hoc Networks
