Reinforcement Learning for Node Selection in Branch-and-Bound
Alexander Mattick, Christopher Mutschler

TL;DR
This paper introduces a reinforcement learning approach using graph neural networks to improve node selection in branch-and-bound algorithms, leading to better efficiency and solution quality across various problem sets.
Contribution
It presents a novel RL-based simulation technique that considers entire tree states for node selection, outperforming existing methods on multiple benchmarks.
Findings
Significant reduction in optimality gap.
Improved per-node efficiency under time constraints.
Effective transfer from synthetic TSP training to real benchmarks.
Abstract
A big challenge in branch and bound lies in identifying the optimal node within the search tree from which to proceed. Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes. To achieve this, we train a graph neural network that produces a probability distribution based on the path from the model's root to its "to-be-selected" leaves. Modelling node-selection as a probability distribution allows us to train the model using state-of-the-art RL techniques that capture both intrinsic node-quality and node-evaluation costs. Our method induces a high quality node selection policy on a set of varied and…
Peer Reviews
Decision·Submitted to ICLR 2024
* Most of the paper is well written and easy to understand for readers with basic knowledge in reinforcement learning and branch-and-bound. * The root-to-leaf path aggregated score is a clever design. It avoids the computation challenge from the growing of the branch-and-bound tree by an intuitive assumption: if a node is good, so should be its ancestors.
* The definition of the reward is not rigorously defined. Specifically, the paper does not disclose how are the gap(node selector) and gap(scip) are calibrated. It could be * The gap when reaches the time budget. * The gap at the same number of nodes n, with $\text{traj}(\text{node selector})[:n]$ rolled out with node selector, $\text{traj}(\text{scip})[:n]$ rolled out with scip, * The gap at the same number of nodes n, with $\text{traj}(\text{node selector})[:n]$ and $\text{traj}(\
1. The paper clearly states the issue (node selection in branch-and-bound) trying to address, and the limitation of the conventional methods on that issue. 2. The paper provides the simulation results in a variety of problem instances.
1. There are existing related works that use graph neural networks for node selection in the branch-and-bound algorithm. The proposed method in this paper uses graph neural networks for tree representation, but the difference from the existing works is not clearly stated. 2. The structure of RL such as states, actions, and reward function is not rigorously defined in the paper. This makes it harder to understand how the RL method works in the proposed method. 3. As the branch-and-bound algorithm
- The global view of trees is a strong motivation given the limitation of current methods in BnB. - I personally like the “greedy” aspect in reasoning (in introduction) that theory vs. practice has a gap, especially for many cases like the BnB and in practice, oftentimes we should favor a shorter-term choice over long-term ones if it’s good enough for many reasons. I think that is correct to the large spectrum of deep learning applications nowadays. - Positive results on many benchmarks. - He
- The strong motivation leads to a much larger cost in carrying out the algorithm, especially when it involves recursion.. However, it’s not clear from the paper as to why the authors only choose the upper bound as a factor of choosing. Would be interesting if they have a study –of maybe a comparison–leading to that choice. - To solve this complex problem, the proposed method has to be broken down into many phases as shown in Section 2. That raises a question about the practicality: can the met
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Machine Learning and Data Classification
MethodsGraph Neural Network
