Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling
Constantin Waubert de Puiseau, Christian D\"orpelkus, Jannik Peters,, Hasan Tercan, Tobias Meisen

TL;DR
This paper introduces a method called δ-sampling to adaptively balance exploration and exploitation in trained reinforcement learning agents for job shop scheduling, improving solution quality within computational constraints.
Contribution
It proposes a novel inference technique for DRL agents that adjusts their behavior based on computational budget, enhancing solution diversity and quality.
Findings
δ-sampling improves search space coverage
Optimal parameterization enhances solution quality
Method outperforms standard inference approaches
Abstract
Learned construction heuristics for scheduling problems have become increasingly competitive with established solvers and heuristics in recent years. In particular, significant improvements have been observed in solution approaches using deep reinforcement learning (DRL). While much attention has been paid to the design of network architectures and training algorithms to achieve state-of-the-art results, little research has investigated the optimal use of trained DRL agents during inference. Our work is based on the hypothesis that, similar to search algorithms, the utilization of trained DRL agents should be dependent on the acceptable computational budget. We propose a simple yet effective parameterization, called -sampling that manipulates the trained action vector to bias agent behavior towards exploration or exploitation during solution construction. By following this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms · Smart Grid Energy Management
