Launchpad: Learning to Schedule Using Offline and Online RL Methods

Vanamala Venkataswamy; Jake Grigsby; Andrew Grimshaw; Yanjun Qi

arXiv:2212.00639·cs.LG·December 5, 2022·1 cites

Launchpad: Learning to Schedule Using Offline and Online RL Methods

Vanamala Venkataswamy, Jake Grigsby, Andrew Grimshaw, Yanjun Qi

PDF

Open Access

TL;DR

This paper introduces Launchpad, a framework that combines offline and online reinforcement learning to improve job scheduling efficiency by leveraging historical data and expert demonstrations, reducing exploration time and enhancing policy performance.

Contribution

The paper presents a novel approach that integrates offline RL and behavior cloning with online training to accelerate learning of scheduling policies from historical datasets.

Findings

01

Offline RL and behavior cloning can learn effective scheduling policies from logged data.

02

Incorporating expert demonstrations accelerates online policy learning.

03

The framework enables continuous improvement through online data collection.

Abstract

Deep reinforcement learning algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often takes thousands of timesteps to explore the environment and adapt from a randomly initialized DNN policy. Existing RL schedulers overlook the importance of learning from historical data and improving upon custom heuristic policies. Offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. Following the recent success of data-driven learning, we explore two RL methods: 1) Behaviour Cloning and 2) Offline RL, which aim to learn policies from logged data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Smart Grid Energy Management