Learning to Solve Orienteering Problem with Time Windows and Variable Profits

Songqun Gao; Zanxi Ruan; Patrick Floor; Marco Roveri; Luigi Palopoli; Daniele Fontanelli

arXiv:2603.06260·cs.LG·March 9, 2026

Learning to Solve Orienteering Problem with Time Windows and Variable Profits

Songqun Gao, Zanxi Ruan, Patrick Floor, Marco Roveri, Luigi Palopoli, Daniele Fontanelli

PDF

Open Access 3 Reviews

TL;DR

This paper introduces DeCoST, a learning-based two-stage optimization method for the complex orienteering problem with time windows and variable profits, improving solution quality and speed over existing methods.

Contribution

DeCoST effectively decouples discrete and continuous variables in OPTWVP, enabling efficient learning and optimization with proven global optimality for the second stage.

Findings

01

Outperforms state-of-the-art solvers in solution quality

02

Achieves up to 6.6x faster inference speed

03

Enhances solution quality across various constructive solvers

Abstract

The orienteering problem with time windows and variable profits (OPTWVP) is common in many real-world applications and involves continuous time variables. Current approaches fail to develop an efficient solver for this orienteering problem variant with discrete and continuous variables. In this paper, we propose a learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST), which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them. In the first stage, a parallel decoding structure is employed to predict the path and the initial service time allocation. The second stage optimizes the service times through a linear programming (LP) formulation and provides a long-horizon learning of structure estimation. We rigorously prove…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 2

Strengths

The authors propose a novel well-designed two-stage framework that cleanly separates discrete and continuous decision-making for the orienteering problem with time windows and variable profits. It also combines rigorous theoretical guarantees via the LP formulation in the second stage with extensive experiments supporting the results.

Weaknesses

One weakness of the paper is that, unlike the second stage, the first-stage routing policy has no theoretical guarantee of optimality. It relies on reinforcement learning, which may converge to only locally optimal routes, so the overall solution quality depends on the effectiveness of this learned policy without any formal performance bound.

Reviewer 02Rating 4Confidence 5

Strengths

1. Clear and structured exposition: The paper is well written, easy to follow, and methodologically consistent. The decomposition into discrete and continuous components is intuitively appealing and technically well-motivated. 2. Solid engineering contribution: The combination of a neural constructive method with an LP-based continuous-time refinement is elegant and appears to yield computational gains.

Weaknesses

1. Motivation for optimizing service times unclear: In standard formulations of the orienteering problem with time windows, service times are typically derived from route structure and scheduling constraints. It remains unclear why an explicit optimization of service times is required and how this impacts practical applicability. A full formal problem statement would clarify the modeling choices. 2. Limited methodological novelty: While the decoupling approach is well-implemented, the paradigm

Reviewer 03Rating 2Confidence 3

Strengths

* The polynomial-time algorithm for finding service times for a given route is a nice contribution, and it is proven to be optimal. * The combination of the RL policy with the polynomial-time algorithm is beneficial (while there exist approaches that use traditional optimization methods to improve solutions provided by NCO for similar problems, there is novelty in the specific combination, which decouples the routing from the scheduling). * The numerical results are promising.

Weaknesses

* The key contribution of the paper seems to be polynomial-time algorithm for finding service time. While this is an interesting contribution, ICLR might not be the best venue for publishing it since it is not a learning-based approach. A broader AI conference might be a better fit. * The significance of this contribution is limited by the fact that the problem can also be solved by a straightforward LP, which also takes polynomial time. So this contribution really only seems important to commun

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVehicle Routing Optimization Methods · Constraint Satisfaction and Optimization · Metaheuristic Optimization Algorithms Research