Preference Optimization for Combinatorial Optimization Problems

Mingjun Pan; Guanquan Lin; You-Wei Luo; Bin Zhu; Zhien Dai; Lijun Sun; Chun Yuan

arXiv:2505.08735·cs.LG·May 14, 2025

Preference Optimization for Combinatorial Optimization Problems

Mingjun Pan, Guanquan Lin, You-Wei Luo, Bin Zhu, Zhien Dai, Lijun Sun, Chun Yuan

PDF

TL;DR

This paper introduces Preference Optimization, a novel reinforcement learning approach that converts reward signals into preferences, improving efficiency and solution quality in combinatorial optimization problems like TSP, CVRP, and FFSP.

Contribution

It proposes a new preference-based RL framework with entropy regularization and local search integration, outperforming existing methods in benchmark problems.

Findings

01

Significantly better convergence efficiency.

02

Higher solution quality on benchmark problems.

03

Effective escape from local optima.

Abstract

Reinforcement Learning (RL) has emerged as a powerful tool for neural combinatorial optimization, enabling models to learn heuristics that solve complex problems without requiring expert knowledge. Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast combinatorial action spaces, leading to inefficiency. In this paper, we propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling, emphasizing the superiority among sampled solutions. Methodologically, by reparameterizing the reward function in terms of policy and utilizing preference models, we formulate an entropy-regularized RL objective that aligns the policy directly with preferences while avoiding intractable computations. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.