CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound
Akhil S Anand, Elias Aarekol, Martin Mziray Dalseg, Magnus Stalhane, and Sebastien Gros

TL;DR
This paper introduces CORL, a reinforcement learning framework that optimizes MILP decision policies directly through B&B, aiming to improve real-world operational performance over traditional modeling approaches.
Contribution
The paper presents a novel RL-based method to fine-tune MILP policies end-to-end, overcoming limitations of supervised learning and surrogate gradient methods.
Findings
CORL successfully optimizes MILP policies in a simple decision-making example.
Reinforcement learning improves operational performance of MILP solutions.
The approach demonstrates potential for real-world applications.
Abstract
Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Reinforcement Learning in Robotics · Constraint Satisfaction and Optimization
