An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones

Taihelong Zeng; Yun Lin; Yuhe Shi; Yan Li; Zhiqing Wei; Xuanru Ji

arXiv:2511.05265·cs.LG·November 10, 2025

An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones

Taihelong Zeng, Yun Lin, Yuhe Shi, Yan Li, Zhiqing Wei, Xuanru Ji

PDF

Open Access

TL;DR

This paper introduces a hierarchical deep reinforcement learning framework with a Transformer-inspired encoder and minimal gated decoder to efficiently solve the complex TSP with Drones, outperforming existing methods in solution quality and training efficiency.

Contribution

It presents a novel hierarchical Actor-Critic deep reinforcement learning model with a specialized attention mechanism for the TSP-D problem, improving solution quality and training speed.

Findings

01

Achieves competitive or superior solutions on TSP-D instances of various scales.

02

Reduces training time significantly compared to existing reinforcement learning algorithms.

03

Provides faster computation times than traditional heuristic algorithms.

Abstract

The emergence of truck-drone collaborative systems in last-mile logistics has positioned the Traveling Salesman Problem with Drones (TSP-D) as a pivotal extension of classical routing optimization, where synchronized vehicle coordination promises substantial operational efficiency and reduced environmental impact, yet introduces NP-hard combinatorial complexity beyond the reach of conventional optimization paradigms. Deep reinforcement learning offers a theoretically grounded framework to address TSP-D's inherent challenges through self-supervised policy learning and adaptive decision-making. This study proposes a hierarchical Actor-Critic deep reinforcement learning framework for solving the TSP-D problem. The architecture consists of two primary components: a Transformer-inspired encoder and an efficient Minimal Gated Unit decoder. The encoder incorporates a novel, optimized k-nearest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUAV Applications and Optimization · Vehicle Routing Optimization Methods · Transportation and Mobility Innovations