Market-Bench: Evaluating Large Language Models on Introductory Quantitative Trading and Market Dynamics
Abhay Srivastava, Sam Jung, Spencer Mateega

TL;DR
This paper introduces MARKET-BENCH, a benchmark for evaluating large language models on quantitative trading tasks, measuring their ability to generate executable backtesters and accurately predict trading metrics.
Contribution
It presents a novel benchmark for assessing LLMs on trading strategy implementation and introduces a comprehensive evaluation of thirteen state-of-the-art models.
Findings
Most models reliably execute simple strategies
Gemini 3 Pro and Claude 4.5 Sonnet show strong reliability and low error
GPT-5.2 achieves the best overall performance
Abstract
We introduce MARKET-BENCH, a benchmark that evaluates large language models (LLMs) on introductory quantitative trading tasks by asking them to construct executable backtesters from natural language strategy descriptions and market assumptions. Each instance specifies one of three canonical strategies: scheduled trading on Microsoft (NASDAQ: MSFT), pairs trading on Coca-Cola (NASDAQ: KO) and Pepsi (NASDAQ: PEP), or delta hedging on MSFT. Models must produce code whose profit and loss (P and L), drawdown, and position paths match a verifiable reference implementation. We assess thirteen state-of-the-art models using a multi-round evaluation that separates structural reliability (whether the backtest runs) from numerical accuracy (mean absolute error of the backtest metrics), assigning failed outputs a duplicated-metrics baseline MAE. While most models reliably execute the simplest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Markets and Investment Strategies · Sports Analytics and Performance
