A Markov Decision Process for Variable Selection in Branch & Bound

Paul Strang; Zacharie Al\`es; C\^ome Bissuel; Olivier Juan; Safia Kedad-Sidhoum; Emmanuel Rachelson

arXiv:2510.19348·cs.LG·October 23, 2025

A Markov Decision Process for Variable Selection in Branch & Bound

Paul Strang, Zacharie Al\`es, C\^ome Bissuel, Olivier Juan, Safia Kedad-Sidhoum, Emmanuel Rachelson

PDF

Open Access 4 Reviews

TL;DR

This paper introduces BBMDP, a Markov Decision Process framework for variable selection in Branch & Bound algorithms, enabling reinforcement learning to improve MILP solving efficiency, with empirical validation showing superior performance over existing RL agents.

Contribution

The paper presents a novel MDP formulation for variable selection in B&B, facilitating the application of RL algorithms to learn better heuristics for MILP problems.

Findings

01

The BBMDP model outperforms previous RL agents on four MILP benchmarks.

02

Empirical results demonstrate improved B&B efficiency using the proposed MDP approach.

03

The formulation enables leveraging a broad range of RL algorithms for B&B heuristics.

Abstract

Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B&B). A key factor influencing the performance of B&B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B&B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B\&B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four…

Peer Reviews

Decision·NeurIPS 2025 poster

Reviewer 01Rating 5Confidence 3

Strengths

**Strengths** - The paper studies an interesting and important problem. B&B algorithms are widely used for combinatorial optimization and any improvements here is likely to be of significant interest and have a large impact. - The paper is well written. The problem formulation is clearly described. The description of prior work is very good. - The proposed approach is intuitively clear and mathematically rigorous. The optimization problem is well defined and the reformulation to stochastic sh

Reviewer 02Rating 4Confidence 4

Strengths

**Strength**: * Enables the use of RL frameworks within B&B solvers. * Models the B&B process as a temporal decision-making problem, allowing for the incorporation of time-dependent strategies (e.g., node selection with early stop). * BBMDP demonstrates strong empirical performance, outperforming other RL-based approaches in benchmark tests. **Weakness**: 1. The presentation of the work is hard to follow when a nomenclature is not present. Since there are symbols with both B&B and RL their def

Reviewer 03Rating 5Confidence 4

Strengths

Attempts to use RL for variable selection in MILP branch-and-bound solving has not yet reached their potential. The paper well situates that in principle RL can surpass supervised learning, namely imitation learning from strong branching. A difficulty has been formulating a MDP for the RL task that has the markov property. Etheve et al in 2020 made an not entirely satisfactory effort, which was formalised by Scavuzzo et al in 2022 as the TreeMDP, accompanied by a careful empirical study. Thi

Reviewer 04Rating 5Confidence 4

Strengths

Strengths: A Principled Formulation. The paper’s primary strength is the introduction of the “Branch & Bound Markov Decision Process” (BBMDP). This is a significant contribution, advancing the field from “MDP-inspired” but technically flawed frameworks (e.g., TreeMDP) to a principled, standard MDP formulation that is theoretically sound. Insightful Analysis. The paper provides a clear and insightful diagnosis of a core issue with prior approaches in Section 3.2, “Misconceptions in Learning to Ex

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Constraint Satisfaction and Optimization · Vehicle Routing Optimization Methods