Learning Branching Policies for MILPs with Proximal Policy Optimization
Abdelouahed Ben Mhamed, Assia Kamal-Idrissi, Amal El Fallah Seghrouchni

TL;DR
This paper introduces TGPPO, a reinforcement learning framework using PPO to learn branching policies for MILPs, which generalizes better across diverse instances compared to imitation learning methods.
Contribution
It proposes a novel RL-based approach with a dynamic state representation for branching in MILPs, improving generalization over existing imitation learning methods.
Findings
TGPPO outperforms existing policies in reducing explored nodes.
It improves p-Primal-Dual Integrals (PDI) especially on out-of-distribution instances.
The approach demonstrates robustness and adaptability in diverse MILP problems.
Abstract
Branch-and-Bound (B\&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale instances. The growing capabilities of machine learning have spurred efforts to improve B\&B by learning data-driven branching policies. However, most existing approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances. In this work, we propose Tree-Gate Proximal Policy Optimization (TGPPO), a novel framework that employs Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to train a branching policy aimed at improving generalization across heterogeneous MILP instances. Our approach builds on a parameterized state space representation that dynamically captures the evolving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsConstraint Satisfaction and Optimization · Reinforcement Learning in Robotics · Advanced Optimization Algorithms Research
