PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data

Pu Cheng; Juncheng Liu; Yunshen Long

arXiv:2604.14199·q-fin.CP·April 17, 2026

PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data

Pu Cheng, Juncheng Liu, Yunshen Long

PDF

1 Repo

TL;DR

PolyBench is a comprehensive benchmark dataset and evaluation framework for testing large language models' ability to predict and trade on live prediction market data, integrating multimodal signals and financial metrics.

Contribution

It introduces PolyBench, a novel multimodal benchmark with real market data, and evaluates LLMs' forecasting and trading performance in a realistic setting.

Findings

01

Only two models achieved positive financial returns.

02

Models showed a gap between language fluency and probabilistic reasoning.

03

PolyBench provides a contamination-proof, financially-grounded evaluation standard.

Abstract

Predicting real-world events from live market signals demands systems that fuse qualitative news with quantitative order-book dynamics under strict temporal discipline -- a challenge existing benchmarks fail to capture. We present \textbf{PolyBench}, a multimodal benchmark derived from Polymarket that records point-in-time cross-sections of 38,666 binary prediction markets spanning 4,997 events, synchronously coupling each snapshot with a Central Limit Order Book (CLOB) state and a real-time news stream. Using PolyBench, we evaluate seven state-of-the-art Large Language Models -- spanning open- and closed-source families -- generating 36,165 predictions under identical, timestamp-locked market states collected between February 6 and 12, 2026. Our multidimensional framework assesses directional accuracy, our proposed Confidence-Weighted Return (CWR), Annualized Percentage Yield (APY),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PolyBench/PolyBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.