CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

Akhil S Anand; Elias Aarekol; Martin Mziray Dalseg; Magnus Stalhane; and Sebastien Gros

arXiv:2512.11169·cs.AI·December 15, 2025

CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

Akhil S Anand, Elias Aarekol, Martin Mziray Dalseg, Magnus Stalhane, and Sebastien Gros

PDF

Open Access

TL;DR

This paper introduces CORL, a reinforcement learning framework that optimizes MILP decision policies directly through B&B, aiming to improve real-world operational performance over traditional modeling approaches.

Contribution

The paper presents a novel RL-based method to fine-tune MILP policies end-to-end, overcoming limitations of supervised learning and surrogate gradient methods.

Findings

01

CORL successfully optimizes MILP policies in a simple decision-making example.

02

Reinforcement learning improves operational performance of MILP solutions.

03

The approach demonstrates potential for real-world applications.

Abstract

Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Reinforcement Learning in Robotics · Constraint Satisfaction and Optimization