Hidden Leaks in Time Series Forecasting: How Data Leakage Affects LSTM Evaluation Across Configurations and Validation Strategies
Salma Albelali, Moataz Ahmed

TL;DR
This paper demonstrates that data leakage significantly biases LSTM time series forecasting evaluations, with validation strategy and configuration choices critically affecting leakage sensitivity and performance estimates.
Contribution
It systematically analyzes how different validation strategies and configurations influence data leakage in LSTM time series evaluation, highlighting the importance of leakage-resistant methods.
Findings
10-fold cross-validation shows up to 20.5% RMSE gain due to leakage
2-way and 3-way splits are more robust with RMSE gain below 5%
Smaller input windows and longer lags increase leakage risk
Abstract
Deep learning models, particularly Long Short-Term Memory (LSTM) networks, are widely used in time series forecasting due to their ability to capture complex temporal dependencies. However, evaluation integrity is often compromised by data leakage, a methodological flaw in which input-output sequences are constructed before dataset partitioning, allowing future information to unintentionally influence training. This study investigates the impact of data leakage on performance, focusing on how validation design mediates leakage sensitivity. Three widely used validation techniques (2-way split, 3-way split, and 10-fold cross-validation) are evaluated under both leaky (pre-split sequence generation) and clean conditions, with the latter mitigating leakage risk by enforcing temporal separation during data splitting prior to sequence construction. The effect of leakage is assessed using RMSE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Forecasting Techniques and Applications · Time Series Analysis and Forecasting
