A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows
Linjiang Cao, Maonan Wang, Xi Xiong

TL;DR
This paper introduces a novel LLM-enhanced Q-learning framework for solving the NP-hard CVRPTW, combining adaptive training, self-correction, and prioritized experience replay to improve solution quality and training efficiency.
Contribution
It presents a new LLM-guided Q-learning approach with a two-phase training and a self-correction mechanism for CVRPTW, advancing solution methods for complex routing problems.
Findings
Achieves 7.3% average cost reduction over traditional Q-learning.
Requires fewer training steps for convergence.
Demonstrates effectiveness in real-time emergency constraints.
Abstract
The Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) is a classic NP-hard combinatorial optimization problem widely applied in logistics distribution and transportation management. Its complexity stems from the constraints of vehicle capacity and time windows, which pose significant challenges to traditional approaches. Advances in Large Language Models (LLMs) provide new possibilities for finding approximate solutions to CVRPTW. This paper proposes a novel LLM-enhanced Q-learning framework to address the CVRPTW with real-time emergency constraints. Our solution introduces an adaptive two-phase training mechanism that transitions from the LLM-guided exploration phase to the autonomous optimization phase of Q-network. To ensure reliability, we design a three-tier self-correction mechanism based on the Chain-of-Thought (CoT) for LLMs: syntactic validation, semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVehicle Routing Optimization Methods · Software-Defined Networks and 5G · Vehicular Ad Hoc Networks (VANETs)
MethodsQ-Learning
