Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

Zengyu Zou; Jingyuan Wang; Yixuan Huang; Junjie Wu

arXiv:2511.17435·cs.LG·December 18, 2025

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

Zengyu Zou, Jingyuan Wang, Yixuan Huang, Junjie Wu

PDF

Open Access 1 Video

TL;DR

This paper introduces the Multi-Agent Pointer Transformer, a novel reinforcement learning framework that improves decision-making efficiency and effectiveness in complex multi-vehicle pickup and delivery problems with dynamic requests.

Contribution

It proposes a Transformer-based neural network architecture with relation-aware attention and informative priors for joint multi-vehicle decision-making in dynamic routing tasks.

Findings

01

MAPT outperforms baseline methods in performance metrics.

02

MAPT achieves faster computation times than classical optimization methods.

03

The framework effectively models inter-entity relationships in multi-vehicle routing.

Abstract

This paper addresses the cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR) and proposes an end-to-end centralized decision-making framework based on sequence-to-sequence, named Multi-Agent Pointer Transformer (MAPT). MVDPDPSR is an extension of the vehicle routing problem and a spatio-temporal system optimization problem, widely applied in scenarios such as on-demand delivery. Classical operations research methods face bottlenecks in computational complexity and time efficiency when handling large-scale dynamic problems. Although existing reinforcement learning methods have achieved some progress, they still encounter several challenges: 1) Independent decoding across multiple vehicles fails to model joint action distributions; 2) The feature extraction network struggles to capture inter-entity relationships; 3) The joint action space is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems· underline

Taxonomy

TopicsVehicle Routing Optimization Methods · Transportation and Mobility Innovations · Traffic control and management