QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies
Alexey Khoroshilov, Alexey Chernysh, Orkhan Ekhtibarov, Nini Kamkia, Dmitry Zmitrovich

TL;DR
QuantCode-Bench is a new benchmark designed to evaluate large language models' ability to generate executable algorithmic trading strategies from natural language descriptions, emphasizing domain-specific logic and API usage.
Contribution
The paper introduces QuantCode-Bench, a comprehensive benchmark with 400 tasks for assessing LLMs in generating trading strategies, and analyzes current models' limitations in this domain.
Findings
Current models struggle with operationalizing trading logic and API usage.
Success depends on aligning natural language, financial logic, and data behavior.
Most failures are not due to syntax but to semantic and operational errors.
Abstract
Large language models have demonstrated strong performance on general-purpose programming tasks, yet their ability to generate executable algorithmic trading strategies remains underexplored. Unlike standard code benchmarks, trading-strategy generation requires simultaneous mastery of domain-specific financial logic, knowledge of a specialized API, and the ability to produce code that is not only syntactically correct but also leads to actual trades on historical data. In this work, we present QuantCode-Bench, a benchmark for the systematic evaluation of modern LLMs in generating strategies for the Backtrader framework from textual descriptions in English. The benchmark contains 400 tasks of varying difficulty collected from Reddit, TradingView, StackExchange, GitHub, and synthetic sources. Evaluation is conducted through a multi-stage pipeline that checks syntactic correctness,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
