SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems
Joel Oren, Chana Ross, Maksym Lefarov, Felix Richter, Ayal Taitler,, Zohar Feldman, Christian Daniel, Dotan Di Castro

TL;DR
This paper introduces SOLO, a hybrid RL and planning approach for combinatorial problems like scheduling and routing, capable of handling both offline and online variants with improved efficiency and robustness.
Contribution
The paper presents a generic, scalable method combining Deep Q-Learning and Monte Carlo Tree Search for online and offline combinatorial optimization.
Findings
Outperforms traditional solvers and heuristics in speed and quality
Effective in online settings with dynamic problem components
Improves robustness of learned policies with search algorithms
Abstract
We study combinatorial problems with real world applications such as machine scheduling, routing, and assignment. We propose a method that combines Reinforcement Learning (RL) and planning. This method can equally be applied to both the offline, as well as online, variants of the combinatorial problem, in which the problem components (e.g., jobs in scheduling problems) are not known in advance, but rather arrive during the decision-making process. Our solution is quite generic, scalable, and leverages distributional knowledge of the problem parameters. We frame the solution process as an MDP, and take a Deep Q-Learning approach wherein states are represented as graphs, thereby allowing our trained policies to deal with arbitrary changes in a principled manner. Though learned policies work well in expectation, small deviations can have substantial negative effects in combinatorial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
