Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]
Huaiyu Jia, Luofeng Zhou, Wentao Zhang, Lin William Cong, Siguang Li, Shuo Sun

TL;DR
This paper introduces a comprehensive, continuously updated dataset for the full lifecycle of decentralized prediction markets on Polymarket, enabling detailed analysis and research in this domain.
Contribution
It presents the first unified, large-scale dataset suite covering all stages of prediction markets, with robust data integration and synchronization methods.
Findings
Dataset includes over 770,000 market records and 943 million trading entries.
Demonstrated utility through NBA outcome calibration and CPI expectation case studies.
Provides a reproducible and extensible data collection pipeline.
Abstract
Prediction markets are markets for trading claims on future events, such as presidential elections, and their prices provide continuously updated signals of collective beliefs. In decentralized platforms such as Polymarket, the market lifecycle spans market creation, token registration, trading, oracle interaction, dispute, and final settlement, yet the corresponding data are fragmented across heterogeneous off-chain and on-chain sources. We present the first continuously maintained dataset suite for the full lifecycle of decentralized prediction markets, built on Polymarket. To address the challenges of large-scale cross-source integration, incomplete linkage, and continuous synchronization, we build a unified relational data system that integrates three canonical layers: market metadata, fill-level trading records, and oracle-resolution events, through identifier resolution, on-chain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
