Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems
Nasrin Sultana, Jeffrey Chan, Tabinda Sarwar, A. K. Qin

TL;DR
This paper introduces a sample-efficient, exploration-based reinforcement learning method using entropy maximization and off-policy techniques to improve solution quality and speed in routing problems like TSP and VRP.
Contribution
It proposes a novel entropy-based, off-policy reinforcement learning approach that enhances sample efficiency and generalizes across various routing problems.
Findings
Outperforms state-of-the-art methods in solution quality.
Reduces computation time significantly.
Generalizes to different routing problem sizes.
Abstract
Model-free deep-reinforcement-based learning algorithms have been applied to a range of COPs~\cite{bello2016neural}~\cite{kool2018attention}~\cite{nazari2018reinforcement}. However, these approaches suffer from two key challenges when applied to combinatorial problems: insufficient exploration and the requirement of many training examples of the search space to achieve reasonable performance. Combinatorial optimisation can be complex, characterised by search spaces with many optimas and large spaces to search and learn. Therefore, a new method is needed to find good solutions that are more efficient by being more sample efficient. This paper presents a new reinforcement learning approach that is based on entropy. In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return and improves the sample efficiency to achieve faster learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVehicle Routing Optimization Methods · Robotic Path Planning Algorithms · Smart Parking Systems Research
