Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis

Orit Davidovich; Shimrit Shtern; Segev Wasserkrug; Nimrod Megiddo

arXiv:2512.08601·stat.ML·January 21, 2026

Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis

Orit Davidovich, Shimrit Shtern, Segev Wasserkrug, Nimrod Megiddo

PDF

Open Access

TL;DR

This paper introduces a unified RL framework for combinatorial optimization, providing theoretical guarantees for convergence and optimality gaps, thereby advancing understanding of neural heuristics in solving complex CO problems.

Contribution

It develops a comprehensive MDP-based model for CO problems and offers convergence analysis with explicit conditions and guarantees for value-based RL methods.

Findings

01

Establishes conditions for RL convergence to approximate CO solutions.

02

Provides bounds on optimality gaps based on problem and algorithm parameters.

03

Highlights the importance of state-space embedding choices.

Abstract

Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts to design heuristics per problem type. Due to their structure, many hard CO problems are amenable to treatment through reinforcement learning (RL). Indeed, we find a wealth of literature training NNs using value-based, policy gradient, or actor-critic approaches, with promising results, both in terms of empirical optimality gaps and inference runtimes. Nevertheless, there has been a paucity of theoretical work undergirding the use of RL for CO problems. To this end, we introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using RL techniques. We provide easy-to-test assumptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Advanced Multi-Objective Optimization Algorithms