CORE: Compensable Reward as a Catalyst for Improving Offline RL in Wireless Networks

Lipeng Zu; Hansong Zhou; Yu Qian; Shayok Chakraborty; Yukun Yuan; Linke Guo; Xiaonan Zhang

arXiv:2512.19671·cs.NI·December 23, 2025

CORE: Compensable Reward as a Catalyst for Improving Offline RL in Wireless Networks

Lipeng Zu, Hansong Zhou, Yu Qian, Shayok Chakraborty, Yukun Yuan, Linke Guo, Xiaonan Zhang

PDF

Open Access

TL;DR

CORE introduces a novel offline reinforcement learning framework tailored for wireless networks, utilizing behavior embedding and compensable rewards to improve policy learning from limited and noisy data.

Contribution

This work pioneers the application of offline RL in wireless networking, proposing a behavior clustering and reward construction method to handle noisy, limited data.

Findings

01

Effective identification of expert trajectories from noisy data

02

Improved policy performance using compensable rewards

03

First systematic exploration of offline RL in wireless networks

Abstract

Real-world wireless data are expensive to collect and often lack sufficient expert demonstrations, causing existing offline RL methods to overfit suboptimal behaviors and exhibit unstable performance. To address this issue, we propose CORE, an offline RL framework specifically designed for wireless environments. CORE identifies latent expert trajectories from noisy datasets via behavior embedding clustering, and trains a conditional variational autoencoder with a contrastive objective to separate expert and non-expert behaviors in latent space. Based on the learned representations, CORE constructs compensable rewards that reflect expert-likelihood, effectively guiding policy learning under limited or imperfect supervision. More broadly, this work represents one of the early systematic explorations of offline RL in wireless networking, where prior adoption remains limited. Beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Networks and Protocols · Advanced MIMO Systems Optimization · Mobile Ad Hoc Networks