Learning to Cut: Reinforcement Learning for Benders Decomposition

Haochen Cai; Xian Yu

arXiv:2605.06516·math.OC·May 8, 2026

Learning to Cut: Reinforcement Learning for Benders Decomposition

Haochen Cai, Xian Yu

PDF

TL;DR

This paper introduces RLBD, a reinforcement learning framework that adaptively selects cuts in Benders decomposition, significantly improving computational efficiency and generalization in stochastic programming.

Contribution

The paper presents a novel RL-based approach for cut selection in Benders decomposition, outperforming traditional and supervised learning methods.

Findings

01

RLBD reduces computation time compared to vanilla BD.

02

RLBD outperforms supervised learning approach LearnBD.

03

RLBD generalizes well to problems with different data and sizes.

Abstract

Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.