Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba
Zhenxing Xu, Zeyuan Ma, Weidong Bao, Yan Zheng, Ji Wang, Zhiguang Cao

TL;DR
This paper introduces ECO, a new neural combinatorial optimization framework that enhances efficiency through batched preference optimization, decoupled training stages, and a memory-efficient encoder-decoder architecture, achieving state-of-the-art results.
Contribution
ECO combines batched preference optimization with a Mamba backbone and local search-guided training to improve efficiency and performance in neural combinatorial optimization tasks.
Findings
ECO outperforms existing neural baselines on TSP and CVRP.
ECO reduces memory usage and increases throughput.
Local search improves preference margins during training.
Abstract
We study efficiency as a first-class objective in Neural Combinatorial Optimization (NCO) and present ECO, an efficient learning framework that combines batched preference optimization with a Mamba backbone. Instead of tightly interleaving every policy update with on-policy rollouts, ECO decouples trajectory generation from gradient updates through two stages: supervised warm-up on pre-computed solutions and iterative Direct Preference Optimization (DPO) on batched candidate sets generated by the current policy. We pair this learning pipeline with a mixed Mamba encoder-decoder that reduces memory growth on long sequences and improves hardware utilization. A local-search-guided bootstrapping strategy is further used during training to widen preference margins and stabilize iterative improvement. Importantly, local search is only used to construct stronger preference pairs during training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
