Sequential Structure in Intraday Futures Data: LSTM vs Gradient Boosting on MNQ
Mathias Mesfin

TL;DR
This study evaluates whether five-minute OHLCV data can reliably predict intraday futures movements using LSTM and gradient boosting, finding no statistically significant predictive advantage over a four-year period.
Contribution
It provides an empirical assessment of a Kronos-inspired architecture on real-world data, establishing a lower bound on data scale needed for effective sequential financial ML.
Findings
No configuration achieved statistically significant out-of-sample accuracy above 51.8%.
Permutation tests yielded high p-values, indicating no significant predictive edge.
Feature importance was unstable, suggesting noise fitting rather than capturing stable signals.
Abstract
This paper compares gradient boosting and long short-term memory (LSTM) architectures for intraday directional prediction in Micro E-Mini Nasdaq 100 futures (MNQ). Motivated by recent foundation-model research on financial candlestick data, including the Kronos architecture, we test whether five-minute OHLCV bar sequences contain exploitable sequential predictive structure at the scale of a single instrument dataset. Using 944 trading days from 2021-2025, four model configurations are evaluated under strict expanding-window walk-forward validation across three out-of-sample periods. The target variable is whether the session close exceeds the 10:30 AM open by more than ten points. No configuration produces statistically significant out-of-sample accuracy above the 51.8% base rate. Combined OOS accuracies range from 50.00% to 50.89% across gradient boosting variants, while the LSTM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
