Learning Branching Policies for MILPs with Proximal Policy Optimization

Abdelouahed Ben Mhamed; Assia Kamal-Idrissi; Amal El Fallah Seghrouchni

arXiv:2511.12986·cs.LG·November 18, 2025

Learning Branching Policies for MILPs with Proximal Policy Optimization

Abdelouahed Ben Mhamed, Assia Kamal-Idrissi, Amal El Fallah Seghrouchni

PDF

Open Access 1 Video

TL;DR

This paper introduces TGPPO, a reinforcement learning framework using PPO to learn branching policies for MILPs, which generalizes better across diverse instances compared to imitation learning methods.

Contribution

It proposes a novel RL-based approach with a dynamic state representation for branching in MILPs, improving generalization over existing imitation learning methods.

Findings

01

TGPPO outperforms existing policies in reducing explored nodes.

02

It improves p-Primal-Dual Integrals (PDI) especially on out-of-distribution instances.

03

The approach demonstrates robustness and adaptability in diverse MILP problems.

Abstract

Branch-and-Bound (B\&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale instances. The growing capabilities of machine learning have spurred efforts to improve B\&B by learning data-driven branching policies. However, most existing approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances. In this work, we propose Tree-Gate Proximal Policy Optimization (TGPPO), a novel framework that employs Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to train a branching policy aimed at improving generalization across heterogeneous MILP instances. Our approach builds on a parameterized state space representation that dynamically captures the evolving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Branching Policies for MILPs with Proximal Policy Optimization· underline

Taxonomy

TopicsConstraint Satisfaction and Optimization · Reinforcement Learning in Robotics · Advanced Optimization Algorithms Research