Reinforcement Learning for Node Selection in Branch-and-Bound

Alexander Mattick; Christopher Mutschler

arXiv:2310.00112·cs.LG·June 6, 2024

Reinforcement Learning for Node Selection in Branch-and-Bound

Alexander Mattick, Christopher Mutschler

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a reinforcement learning approach using graph neural networks to improve node selection in branch-and-bound algorithms, leading to better efficiency and solution quality across various problem sets.

Contribution

It presents a novel RL-based simulation technique that considers entire tree states for node selection, outperforming existing methods on multiple benchmarks.

Findings

01

Significant reduction in optimality gap.

02

Improved per-node efficiency under time constraints.

03

Effective transfer from synthetic TSP training to real benchmarks.

Abstract

A big challenge in branch and bound lies in identifying the optimal node within the search tree from which to proceed. Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes. To achieve this, we train a graph neural network that produces a probability distribution based on the path from the model's root to its "to-be-selected" leaves. Modelling node-selection as a probability distribution allows us to train the model using state-of-the-art RL techniques that capture both intrinsic node-quality and node-evaluation costs. Our method induces a high quality node selection policy on a set of varied and…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

* Most of the paper is well written and easy to understand for readers with basic knowledge in reinforcement learning and branch-and-bound. * The root-to-leaf path aggregated score is a clever design. It avoids the computation challenge from the growing of the branch-and-bound tree by an intuitive assumption: if a node is good, so should be its ancestors.

Weaknesses

* The definition of the reward is not rigorously defined. Specifically, the paper does not disclose how are the gap(node selector) and gap(scip) are calibrated. It could be * The gap when reaches the time budget. * The gap at the same number of nodes n, with $\text{traj}(\text{node selector})[:n]$ rolled out with node selector, $\text{traj}(\text{scip})[:n]$ rolled out with scip, * The gap at the same number of nodes n, with $\text{traj}(\text{node selector})[:n]$ and $\text{traj}(\

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

1. The paper clearly states the issue (node selection in branch-and-bound) trying to address, and the limitation of the conventional methods on that issue. 2. The paper provides the simulation results in a variety of problem instances.

Weaknesses

1. There are existing related works that use graph neural networks for node selection in the branch-and-bound algorithm. The proposed method in this paper uses graph neural networks for tree representation, but the difference from the existing works is not clearly stated. 2. The structure of RL such as states, actions, and reward function is not rigorously defined in the paper. This makes it harder to understand how the RL method works in the proposed method. 3. As the branch-and-bound algorithm

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- The global view of trees is a strong motivation given the limitation of current methods in BnB. - I personally like the “greedy” aspect in reasoning (in introduction) that theory vs. practice has a gap, especially for many cases like the BnB and in practice, oftentimes we should favor a shorter-term choice over long-term ones if it’s good enough for many reasons. I think that is correct to the large spectrum of deep learning applications nowadays. - Positive results on many benchmarks. - He

Weaknesses

- The strong motivation leads to a much larger cost in carrying out the algorithm, especially when it involves recursion.. However, it’s not clear from the paper as to why the authors only choose the upper bound as a factor of choosing. Would be interesting if they have a study –of maybe a comparison–leading to that choice. - To solve this complex problem, the proposed method has to be broken down into many phases as shown in Section 2. That raises a question about the practicality: can the met

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Machine Learning and Data Classification

MethodsGraph Neural Network