RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research
Ching Pui Wan, Tung Li, Jason Min Wang

TL;DR
RLOR is a flexible deep reinforcement learning framework tailored for operation research, enabling easier integration of recent RL advances and customizable architectures, demonstrated through vehicle routing problem solutions with significant training speed improvements.
Contribution
The paper introduces RLOR, a versatile framework that enhances deep reinforcement learning applications in operation research by allowing flexible model architecture customization and incorporating recent RL techniques.
Findings
Re-implemented Attention Model with PPO, achieving 8x faster training.
Demonstrated the framework's flexibility in adapting RL models for operation research.
Publicly released code for reproducibility and further research.
Abstract
Reinforcement learning has been applied in operation research and has shown promise in solving large combinatorial optimization problems. However, existing works focus on developing neural network architectures for certain problems. These works lack the flexibility to incorporate recent advances in reinforcement learning, as well as the flexibility of customizing model architectures for operation research problems. In this work, we analyze the end-to-end autoregressive models for vehicle routing problems and show that these models can benefit from the recent advances in reinforcement learning with a careful re-implementation of the model architecture. In particular, we re-implemented the Attention Model and trained it with Proximal Policy Optimization (PPO) in CleanRL, showing at least 8 times speed up in training time. We hereby introduce RLOR, a flexible framework for Deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Vehicle Routing Optimization Methods · Assembly Line Balancing Optimization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
