Unraveling the Rainbow: can value-based methods schedule?
Arthur Corr\^ea, Alexandre Jesus, Paulo Nascimento, Crist\'ov\~ao Silva, Samuel Moniz

TL;DR
This paper empirically compares value-based and policy-gradient deep reinforcement learning algorithms on complex scheduling problems, revealing that value-based methods are more stable and generalize better, challenging common assumptions in the field.
Contribution
It provides the first extensive empirical evaluation of value-based versus policy-gradient algorithms in combinatorial scheduling, highlighting the strengths of value-based methods.
Findings
Value-based algorithms show lower variance and more stable convergence.
Value-based algorithms outperform policy-gradient methods in generalization.
Performance depends on problem structure, such as flexibility and size.
Abstract
In this work, we conduct an extensive empirical study of several deep reinforcement learning algorithms on two challenging combinatorial optimization problems: the job-shop and flexible job-shop scheduling problems, both fundamental challenges with multiple industrial applications. Broadly, deep reinforcement learning algorithms fall into two categories: policy-gradient and value-based. While value-based algorithms have achieved notable success in domains such as the Arcade Learning Environment, the combinatorial optimization community has predominantly favored policy-gradient algorithms, often overlooking the potential of value-based alternatives. From our results, value-based algorithms demonstrated a lower variance and a more stable convergence profile compared to policy-gradient ones. Moreover, they achieved superior cross-size and cross-distribution generalization, that is,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life · Clinical practice guidelines implementation · Delphi Technique in Research
MethodsSoftmax · Attention Is All You Need
