# Benchmark Dataset for Mid-Price Forecasting of Limit Order Book Data   with Machine Learning Methods

**Authors:** Adamantios Ntakaris, Martin Magris, Juho Kanniainen, Moncef Gabbouj,, Alexandros Iosifidis

arXiv: 1705.03233 · 2020-03-12

## TL;DR

This paper introduces the first publicly available large-scale benchmark dataset of high-frequency limit order book data for mid-price prediction, enabling standardized evaluation of machine learning methods in financial markets.

## Contribution

It provides a comprehensive, normalized dataset of five stocks' limit order book data and a cross-validation protocol for benchmarking predictive models.

## Key findings

- Baseline methods established performance benchmarks.
- The dataset facilitates comparison of state-of-the-art forecasting techniques.
- Provides a foundation for developing advanced high-frequency trading algorithms.

## Abstract

Managing the prediction of metrics in high-frequency financial markets is a challenging task. An efficient way is by monitoring the dynamics of a limit order book to identify the information edge. This paper describes the first publicly available benchmark dataset of high-frequency limit order markets for mid-price prediction. We extracted normalized data representations of time series data for five stocks from the NASDAQ Nordic stock market for a time period of ten consecutive days, leading to a dataset of ~4,000,000 time series samples in total. A day-based anchored cross-validation experimental protocol is also provided that can be used as a benchmark for comparing the performance of state-of-the-art methodologies. Performance of baseline approaches are also provided to facilitate experimental comparisons. We expect that such a large-scale dataset can serve as a testbed for devising novel solutions of expert systems for high-frequency limit order book data analysis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.03233/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1705.03233/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/1705.03233/full.md

---
Source: https://tomesphere.com/paper/1705.03233