Safe Learning for Near Optimal Scheduling
Damien Busatto-Gaston, Debraj Chakraborty, Shibashis Guha, Guillermo, A. P\'erez, Jean-Fran\c{c}ois Raskin

TL;DR
This paper presents a novel approach combining synthesis, model-based learning, and online sampling to develop safe, near-optimal schedulers for large preemptible task systems, overcoming limitations of existing model checkers.
Contribution
It introduces algorithms with PAC guarantees and extends Monte-Carlo tree search with safety advice, enabling safe exploration of large MDPs beyond current capabilities.
Findings
Algorithms outperform shielded deep Q-learning on large systems
Provides PAC guarantees for model learning
Handles MDPs with over 1020 states
Abstract
In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReal-Time Systems Scheduling · Software Reliability and Analysis Research · Age of Information Optimization
