Q-greedyUCB: a New Exploration Policy for Adaptive and   Resource-efficient Scheduling

Yu Zhao; Joohyun Lee; Wei Chen

arXiv:2006.05902·eess.SY·June 11, 2020

Q-greedyUCB: a New Exploration Policy for Adaptive and Resource-efficient Scheduling

Yu Zhao, Joohyun Lee, Wei Chen

PDF

Open Access

TL;DR

This paper introduces Q-greedyUCB, a novel reinforcement learning algorithm for adaptive scheduling that optimally balances delay and energy consumption in communication systems, demonstrating improved efficiency and convergence.

Contribution

It develops and proves the convergence of Q-greedyUCB, a new RL algorithm combining Q-learning and UCB for constrained scheduling, outperforming existing methods.

Findings

01

Q-greedyUCB achieves optimal scheduling strategies.

02

It reduces regret by up to 12% compared to baseline algorithms.

03

The algorithm converges faster and is more efficient in simulations.

Abstract

This paper proposes a learning algorithm to find a scheduling policy that achieves an optimal delay-power trade-off in communication systems. Reinforcement learning (RL) is used to minimize the expected latency for a given energy constraint where the environments such as traffic arrival rates or channel conditions can change over time. For this purpose, this problem is formulated as an infinite-horizon Markov Decision Process (MDP) with constraints. To handle the constrained optimization problem, we adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines Q-learning for \emph{average} reward algorithm and Upper Confidence Bound (UCB) policy to solve this decision-making problem. We prove that the Q-greedyUCB algorithm is convergent through mathematical analysis. Simulation results show that Q-greedyUCB finds an optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Advanced MIMO Systems Optimization · Advanced Wireless Network Optimization