Online Learning Schemes for Power Allocation in Energy Harvesting Communications
Pranav Sakulkar, Bhaskar Krishnamachari

TL;DR
This paper introduces online learning algorithms for power allocation in energy harvesting communication systems, optimizing data rates over unknown, time-varying channels by adaptively learning the reward structure.
Contribution
It proposes two novel algorithms, UCLP and Epoch-UCLP, for online learning of power policies in energy harvesting systems modeled as MDPs, with proven regret bounds.
Findings
Both algorithms achieve bounded regret, ensuring near-optimal performance.
Epoch-UCLP significantly reduces computational complexity compared to UCLP.
Algorithms are adaptable to related online cost minimization problems.
Abstract
We consider the problem of power allocation over a time-varying channel with unknown distribution in energy harvesting communication systems. In this problem, the transmitter has to choose the transmit power based on the amount of stored energy in its battery with the goal of maximizing the average rate obtained over time. We model this problem as a Markov decision process (MDP) with the transmitter as the agent, the battery status as the state, the transmit power as the action and the rate obtained as the reward. The average reward maximization problem over the MDP can be solved by a linear program (LP) that uses the transition probabilities for the state-action pairs and their reward values to choose a power allocation policy. Since the rewards associated the state-action pairs are unknown, we propose two online learning algorithms: UCLP and Epoch-UCLP that learn these rewards and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
