# Delay-Optimal Probabilistic Scheduling with Arbitrary Arrival and   Adaptive Transmission

**Authors:** Xiang Chen, Wei Chen, Joohyun Lee, and Ness B. Shroff

arXiv: 1702.08052 · 2017-11-01

## TL;DR

This paper develops a delay-power tradeoff framework for adaptive transmission scheduling with arbitrary arrivals, using Markov decision processes and reinforcement learning to find optimal policies efficiently.

## Contribution

It introduces a novel CMDP-based model for delay-power tradeoff, characterizes the optimal policy as threshold-based, and proposes efficient algorithms including reinforcement learning for unknown arrival distributions.

## Key findings

- Optimal tradeoff curve is decreasing, convex, and piecewise linear.
- Optimal policy is threshold-based.
- Reinforcement learning effectively finds policies with unknown arrival distributions.

## Abstract

In this paper, we aim to obtain the optimal delay-power tradeoff and the corresponding optimal scheduling policy for an arbitrary i.i.d. arrival process and adaptive transmissions. The number of backlogged packets at the transmitter is known to a scheduler, who has to determine how many backlogged packets to transmit during each time slot. The power consumption is assumed to be convex in transmission rates. Hence, if the scheduler transmits faster, the delay will be reduced but with higher power consumption. To obtain the optimal delay-power tradeoff and the corresponding optimal policy, we model the problem as a Constrained Markov Decision Process (CMDP), where we minimize the average delay given an average power constraint. By steady-state analysis and Lagrangian relaxation, we can show that the optimal tradeoff curve is decreasing, convex, and piecewise linear, and the optimal policy is threshold-based. Based on the revealed properties of the optimal policy, we develop an algorithm to efficiently obtain the optimal tradeoff curve and the optimal policy with full information of the system. The complexity of our proposed algorithm is much lower than a general algorithm based on Linear Programming. However, usually the distribution of the arrival process is unknown to the scheduler, therefore we proposed a reinforcement learning algorithm to efficiently obtain the optimal policy under this circumstance. We also analyse in details about how the system parameters affect the optimal policy and the system performance. In the final, we use simulations to validate the derived results and the proposed algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.08052/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1702.08052/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1702.08052/full.md

---
Source: https://tomesphere.com/paper/1702.08052