AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models
Wentao Zhang, Mingxuan Zhao, Jincheng Gao, Jieshun You, Huaiyu Jia, Yilei Zhao, Bo An, Shuo Sun

TL;DR
AlphaForgeBench introduces a new benchmarking framework that uses large language models as financial reasoning tools to generate strategies, addressing instability issues in previous LLM-based trading evaluations.
Contribution
It redefines LLMs as research assistants for strategy development, enabling deterministic, reproducible evaluations and improving reliability of financial benchmarking.
Findings
LLM-based trading agents show high variance and irrational actions.
AlphaForgeBench effectively reduces execution instability.
The framework enhances evaluation of financial reasoning and strategy formulation.
Abstract
The rapid advancement of Large Language Models (LLMs) has led to a surge of financial benchmarks, evolving from static knowledge tests to interactive trading simulations. However, current evaluations of real-time trading performance overlook a critical failure mode: severe behavioral instability in sequential decision-making under uncertainty. We empirically show that LLM-based trading agents exhibit extreme run-to-run variance, inconsistent action sequences even under deterministic decoding, and irrational action flipping across adjacent time steps. These issues stem from stateless autoregressive architectures lacking persistent action memory, as well as sensitivity to continuous-to-discrete action mappings in portfolio allocation. As a result, many existing financial trading benchmarks produce unreliable, non-reproducible, and uninformative evaluations. To address these limitations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Markets and Investment Strategies · Complex Systems and Time Series Analysis
