Market-Bench: Evaluating Large Language Models on Introductory Quantitative Trading and Market Dynamics

Abhay Srivastava; Sam Jung; Spencer Mateega

arXiv:2512.12264·cs.CL·January 22, 2026

Market-Bench: Evaluating Large Language Models on Introductory Quantitative Trading and Market Dynamics

Abhay Srivastava, Sam Jung, Spencer Mateega

PDF

Open Access

TL;DR

This paper introduces MARKET-BENCH, a benchmark for evaluating large language models on quantitative trading tasks, measuring their ability to generate executable backtesters and accurately predict trading metrics.

Contribution

It presents a novel benchmark for assessing LLMs on trading strategy implementation and introduces a comprehensive evaluation of thirteen state-of-the-art models.

Findings

01

Most models reliably execute simple strategies

02

Gemini 3 Pro and Claude 4.5 Sonnet show strong reliability and low error

03

GPT-5.2 achieves the best overall performance

Abstract

We introduce MARKET-BENCH, a benchmark that evaluates large language models (LLMs) on introductory quantitative trading tasks by asking them to construct executable backtesters from natural language strategy descriptions and market assumptions. Each instance specifies one of three canonical strategies: scheduled trading on Microsoft (NASDAQ: MSFT), pairs trading on Coca-Cola (NASDAQ: KO) and Pepsi (NASDAQ: PEP), or delta hedging on MSFT. Models must produce code whose profit and loss (P and L), drawdown, and position paths match a verifiable reference implementation. We assess thirteen state-of-the-art models using a multi-round evaluation that separates structural reliability (whether the backtest runs) from numerical accuracy (mean absolute error of the backtest metrics), assigning failed outputs a duplicated-metrics baseline MAE. While most models reliably execute the simplest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Financial Markets and Investment Strategies · Sports Analytics and Performance