Learning Power Control from a Fixed Batch of Data
Mohammad G. Khoshkholgh, Halim Yanikomeroglu

TL;DR
This paper presents an offline deep reinforcement learning approach to adapt power control policies from monitored environment data to new, unexplored environments, demonstrating rapid learning despite environmental differences.
Contribution
It introduces a method for effective power control policy transfer using offline RL with limited and sub-optimal data, addressing environmental discrepancies.
Findings
Agent learns power control quickly in new environments.
High-quality policies achieved with only one-third of collected data.
Sub-optimal data from other algorithms can be effectively used.
Abstract
We address how to exploit power control data, gathered from a monitored environment, for performing power control in an unexplored environment. We adopt offline deep reinforcement learning, whereby the agent learns the policy to produce the transmission powers solely by using the data. Experiments demonstrate that despite discrepancies between the monitored and unexplored environments, the agent successfully learns the power control very quickly, even if the objective functions in the monitored and unexplored environments are dissimilar. About one third of the collected data is sufficient to be of high-quality and the rest can be from any sub-optimal algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
