Learning Shortest Paths When Data is Scarce

Dmytro Matsypura; Yu Pan; Hanzhao Wang

arXiv:2601.03629·cs.LG·January 8, 2026

Learning Shortest Paths When Data is Scarce

Dmytro Matsypura, Yu Pan, Hanzhao Wang

PDF

Open Access

TL;DR

This paper introduces a method to accurately estimate shortest paths in large networks using limited real data, synthetic samples, and a smooth bias model, with theoretical guarantees and active learning strategies.

Contribution

It proposes a Laplacian-regularized bias estimation approach, finite-sample error bounds, and an active learning algorithm for data-efficient routing in scarce data scenarios.

Findings

01

Effective bias calibration in data-scarce regimes

02

Theoretical error bounds and suboptimality guarantees

03

Successful experiments on road and traffic networks

Abstract

Digital twins and other simulators are increasingly used to support routing decisions in large-scale networks. However, simulator outputs often exhibit systematic bias, while ground-truth measurements are costly and scarce. We study a stochastic shortest-path problem in which a planner has access to abundant synthetic samples, limited real-world observations, and an edge-similarity structure capturing expected behavioral similarity across links. We model the simulator-to-reality discrepancy as an unknown, edge-specific bias that varies smoothly over the similarity graph, and estimate it using Laplacian-regularized least squares. This approach yields calibrated edge cost estimates even in data-scarce regimes. We establish finite-sample error bounds, translate estimation error into path-level suboptimality guarantees, and propose a computable, data-driven certificate that verifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques · Traffic control and management · Privacy-Preserving Technologies in Data