SEAFormer: A Spatial Proximity and Edge-Aware Transformer for Real-World Vehicle Routing Problems
Saeed Nasehi Basharzad, Farhana Choudhury, Egemen Tanin

TL;DR
SEAFormer is a novel transformer model that effectively incorporates node and edge information, utilizing locality-aware clustering and residual fusion, to solve large-scale real-world vehicle routing problems with improved efficiency and accuracy.
Contribution
It introduces Clustered Proximity Attention and an edge-aware module, enabling efficient large-scale RWVRP solving and better utilization of sequence dependencies and edge information.
Findings
Outperforms state-of-the-art methods on four RWVRVP variants.
First neural method to solve 1,000+ node RWVRPs effectively.
Achieves superior performance on classic VRPs.
Abstract
Real-world Vehicle Routing Problems (RWVRPs) require solving complex, sequence-dependent challenges at scale with constraints such as delivery time window, replenishment or recharging stops, asymmetric travel cost, etc. While recent neural methods achieve strong results on large-scale classical VRP benchmarks, they struggle to address RWVRPs because their strategies overlook sequence dependencies and underutilize edge-level information, which are precisely the characteristics that define the complexity of RWVRPs. We present SEAFormer, a novel transformer that incorporates both node-level and edge-level information in decision-making through two key innovations. First, our Clustered Proximity Attention (CPA) exploits locality-aware clustering to reduce the complexity of attention from to while preserving global perspective, allowing SEAFormer to efficiently train on large…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper proposes a novel transformer-based framework for solving four real-world vehicle routing variants with diverse problem sizes. 2. The proposed CPA computes attention scores using locality-aware clustering and achieving O(n) complexity while preserving global perspective. 3. Experimental results show the superiority of the proposed method against baselines across various problem sizes and problem constraints.
1. While CPA and edge modules are well-motivated, the ideas of developing attention with linear complexity and exploiting edge-level information are not novel [1][2][3], and they also build on existing sparse attention and residual fusion ideas (e.g., Reformer, FlashAttention, GAT-based methods). 2. The proposed CPA incorporates some hyper-parameters, e.g., Cluster Size (M) and Partitioning Rounds (R). However, it lacks a systematic analysis of their sensitivity and effects to the model perform
1. SEAFormer demonstrates competitive performance across multiple RWVRP variants. 2. The introduction of Clustered Proximity Attention (CPA) effectively reduces the computational complexity of traditional attention mechanisms. 3. The edge-aware module provides a practical solution for incorporating edge-level information, enhancing model accuracy and convergence speed.
1. The definition of “real-world VRP” is broad, and it is unclear what specific challenges are being addressed. Although variants like VRPTW and EVRPCS are mentioned, the paper lacks a clear explanation of why existing methods cannot be extended to these variants. The motivations behind CPA and the edge-aware module are more engineering-oriented, with little exploration of the underlying theoretical mechanisms. 2. The CPA approach lacks clear innovation when compared to existing local attention
Important and Challenging Problem: The paper tackles the critical gap between NCO research (small VRPs) and industrial applications (large-scale RWVRPs). Solving 1000+ node RWVRPs is a major milestone. Novel Architectural: The CPA innovation (based on geometric priors of polar coordinates) is an insightful design. Meanwhile, decoupling node-level sequential attention from edge-level global features is an effective way to handle heterogeneous constraints. Strong Experimental Results: Extensive
1. Incomplete Complexity Analysis: The paper claims O(n) complexity (i.e., O(nRM)) in Equation (6). This conclusion relies on R and M being fixed constants. However, the authors do not discuss the boundary condition where R *M > n, in which case O(nRM) would not hold. A rigorous theoretical analysis is encouraged. 2. Insufficient Analytical Justification: The paper lacks rigorous analytical evidence to explain the source of its strong empirical performance. It remains unclear whether the "scala
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVehicle Routing Optimization Methods · Vehicular Ad Hoc Networks (VANETs) · Advanced Neural Network Applications
