XXLTraffic: Expanding and Extremely Long Traffic forecasting beyond test adaptation
Du Yin, Hao Xue, Arian Prabowo, Shuang Ao, Flora Salim

TL;DR
This paper introduces XXLTraffic, the largest public traffic dataset with the longest timespan, designed to enable research on extremely long-term traffic forecasting beyond traditional test adaptation scenarios.
Contribution
The paper presents XXLTraffic, a new extensive dataset that captures evolving traffic patterns and supports long-term forecasting research beyond existing datasets.
Findings
Largest public traffic dataset with long timespan from LA and Australia
Supports both typical and novel long-term forecasting configurations
Facilitates development of models for practical, real-world traffic prediction
Abstract
Traffic forecasting is crucial for smart cities and intelligent transportation initiatives, where deep learning has made significant progress in modeling complex spatio-temporal patterns in recent years. However, current public datasets have limitations in reflecting the distribution shift nature of real-world scenarios, characterized by continuously evolving infrastructures, varying temporal distributions, and long temporal gaps due to sensor downtimes or changes in traffic patterns. These limitations inevitably restrict the practical applicability of existing traffic forecasting datasets. To bridge this gap, we present XXLTraffic, largest available public traffic dataset with the longest timespan collected from Los Angeles, USA, and New South Wales, Australia, curated to support research in extremely long forecasting beyond test adaptation. Our benchmark includes both typical…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. A brand-new dataset is proposed, which is the largest available public traffic dataset in extremely long traffic forecasting. 2. Numerous comparative experiments validate the differences in performance of different models on this dataset, providing a good guide for peers to understand the latest technological developments.
1. The maximum number of nodes in the proposed dataset is less than the Large-ST, which seems to need improvement, after all, the real road network is very large. 2. I think it is necessary for the authors to explain in more depth the motivation for constructing such a long-term dataset, because as transportation infrastructure advances and human travel modes change, traffic data that is too ancient is not always helpful enough to understand future transportation patterns. 3. Due to the sheer
(1) A new traffic dataset is proposed, comprising data from numerous regions across California and New South Wales over an extended time period. (2) The temporal distribution evolution of selected data was visualized, revealing the evolutionary characteristics of the data. (3) Tests were conducted on relevant prediction tasks, including scenarios with time gaps and varying input lengths.
(1) Compared to existing works like LargeST, this work's contribution is insufficient as it merely expands the temporal and spatial scope of data collection. These data can be obtained through the open-source PEMS system in the same way. (2) Although the number of regions and time span are substantial, the vast majority of data is limited to California. Including data from more cities and countries might have been a better approach. (3) Most baselines in the experiments are specifically design
1. The proposed XXLTraffic is the largest publicly available traffic dataset so far. The PEMS part spans up to 23 years and the NSW part also has around 11 years’ data. 2. The authors evaluated the performance of existing baselines on the proposed datasets with various settings. The overall results verify the importance of introducing large-scale datasets with extremely long time spans.
1. The most significant weakness is that this paper has no novelty. Although this conference has a dataset and benchmarks area, I do not think, for a top-tier conference, it is enough to propose a large-scale dataset without any model contribution. Given the fact that there are existing large-scale datasets available with just the number of nodes or time spans fewer than XXTraffic, my concern becomes more intense. 2. Some motivations remain unclear. In the introduction section, the authors ment
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Time Series Analysis and Forecasting · Big Data Technologies and Applications
