GAMA: A Neural Neighborhood Search Method with Graph-aware Multi-modal Attention for Vehicle Routing Problem
Xiangling Chen, Yi Mei, Mengjie Zhang

TL;DR
GAMA introduces a graph-aware multi-modal attention framework for neural neighborhood search in vehicle routing, effectively capturing complex structural information and improving solution quality over existing methods.
Contribution
It presents a novel multi-modal attention approach with graph neural networks and gated fusion for enhanced VRP solving, surpassing prior neural methods.
Findings
GAMA significantly outperforms recent neural baselines on synthetic and benchmark VRP instances.
The multi-modal attention mechanism improves the encoding of structural and semantic information.
Ablation studies validate the importance of the attention and fusion components for performance gains.
Abstract
Recent advances in neural neighborhood search methods have shown potential in tackling Vehicle Routing Problems (VRPs). However, most existing approaches rely on simplistic state representations and fuse heterogeneous information via naive concatenation, limiting their ability to capture rich structural and semantic context. To address these limitations, we propose GAMA, a neural neighborhood search method with Graph-aware Multi-modal Attention model in VRP. GAMA encodes the problem instance and its evolving solution as distinct modalities using graph neural networks, and models their intra- and inter-modal interactions through stacked self- and cross-attention layers. A gated fusion mechanism further integrates the multi-modal representations into a structured state, enabling the policy to make informed and generalizable operator selection decisions. Extensive experiments conducted…
Peer Reviews
Decision·Submitted to ICLR 2026
The proposed multi-modal encoder architecture is a primary strength. The conceptual separation of the static instance graph and the dynamic solution graph is well-motivated, and the use of explicit cross-attention and gated fusion to integrate them is a logical and novel contribution. The ablation study effectively validates this design, demonstrating that both the cross-attention and gated fusion components contribute positively to the final solution quality.
The method's primary contribution is fundamentally undermined by its prohibitive computational cost. The results in Table 1 show that GAMA requires 6.6 days of inference time to solve CVRP100 instances. This is juxtaposed against the 33 minutes required by the DACT baseline and 4.5 hours by the classical HGS solver. An approximate 288-fold increase in runtime compared to DACT for a marginal 0.26% improvement in solution quality represents an unjustifiable trade-off, rendering the method unusable
1. This paper models problem instance and solution graphs as distinct modalities, then uses self-attention and cross-attention to learn intra-modality and inter-modality dependencies. They are well-motivated and clearly improve information flow between modalities. 2. Experimental results on standard VRP benchmarks show consistent improvements over strong neural and heuristic baselines.
1. This paper is some conceptual overlap with DACT[1] and N2S[2]. The ideas of separating instance and solution information, and learning them through both self-attention and cross-attention are very similar to DACT. 2. The experiments only focus on CVRP problem, where more results on other representative VRP variants are expected. 3. In comparison to DACT, the superiority of the proposed GAMA is not obvious, especially considering both optimality gaps and computation efficiency. For example, wi
1. The paper is well-written, and the proposed architecture is clearly explained. 2. The motivation is sound. Identifying the static problem instance and the dynamic solution as two distinct modalities is an intuitive and sensible approach.
1. The practical value of the proposed method is questionable when analyzing the results in Table 1. (1) GAMA vs. Other L2I: On CVRP100 (T=20k), GAMA achieves an average cost of 15.6510 in 6.6 days. DACT (T=20k) achieves a very close 15.6925 in only 33 minutes. GAMA is approximately 288 times slower for a marginal 0.26% gap enhancement in solution quality. (2) vs. Classical Solvers: The classical solver HGS achieves an average cost of 15.6994 in 4.5 hours. GAMA not only fails to significantly be
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVehicle Routing Optimization Methods · Vehicular Ad Hoc Networks (VANETs) · Advanced Neural Network Applications
