Reinforcement Learning for Search Tree Size Minimization in Constraint Programming: New Results on Scheduling Benchmarks
Vil\'em Heinz, Petr Vil\'im, Zden\v{e}k Hanz\'alek

TL;DR
This paper introduces a reinforcement learning approach to optimize search tree size in Constraint Programming, significantly improving solution times and bounds on key scheduling benchmarks.
Contribution
It applies multi-armed bandit reinforcement learning to failure-directed search, extending and tuning it for scheduling problems, achieving substantial performance improvements.
Findings
1. Enhanced FDS is 1.7x faster on JSSP and 2.1x faster on RCPSP benchmarks.
2. Outperforms IBM CP Optimizer 22.1 FDS algorithm in speed.
3. Improves or closes bounds on most benchmark instances.
Abstract
Failure-Directed Search (FDS) is a significant complete generic search algorithm used in Constraint Programming (CP) to efficiently explore the search space, proven particularly effective on scheduling problems. This paper analyzes FDS's properties, showing that minimizing the size of its search tree guided by ranked branching decisions is closely related to the Multi-armed bandit (MAB) problem. Building on this insight, MAB reinforcement learning algorithms are applied to FDS, extended with problem-specific refinements and parameter tuning, and evaluated on the two most fundamental scheduling problems, the Job Shop Scheduling Problem (JSSP) and Resource-Constrained Project Scheduling Problem (RCPSP). The resulting enhanced FDS, using the best extended MAB algorithm and configuration, performs 1.7 times faster on the JSSP and 2.1 times faster on the RCPSP benchmarks compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
