Launchpad: Learning to Schedule Using Offline and Online RL Methods
Vanamala Venkataswamy, Jake Grigsby, Andrew Grimshaw, Yanjun Qi

TL;DR
This paper introduces Launchpad, a framework that combines offline and online reinforcement learning to improve job scheduling efficiency by leveraging historical data and expert demonstrations, reducing exploration time and enhancing policy performance.
Contribution
The paper presents a novel approach that integrates offline RL and behavior cloning with online training to accelerate learning of scheduling policies from historical datasets.
Findings
Offline RL and behavior cloning can learn effective scheduling policies from logged data.
Incorporating expert demonstrations accelerates online policy learning.
The framework enables continuous improvement through online data collection.
Abstract
Deep reinforcement learning algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often takes thousands of timesteps to explore the environment and adapt from a randomly initialized DNN policy. Existing RL schedulers overlook the importance of learning from historical data and improving upon custom heuristic policies. Offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. Following the recent success of data-driven learning, we explore two RL methods: 1) Behaviour Cloning and 2) Offline RL, which aim to learn policies from logged data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Smart Grid Energy Management
