SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics

Yuchen Cao; Hanlin Zhang; Jacky Wai Keung; Yang Chen; and Linqi Song

arXiv:2604.04812·cs.SE·April 7, 2026

SysTradeBench: An Iterative Build-Test-Patch Benchmark for Strategy-to-Code Trading Systems with Drift-Aware Diagnostics

Yuchen Cao, Hanlin Zhang, Jacky Wai Keung, Yang Chen, and Linqi Song

PDF

TL;DR

SysTradeBench is a comprehensive benchmark for evaluating large language model-generated trading systems, emphasizing iterative development, drift detection, and multi-dimensional performance metrics.

Contribution

Introduces SysTradeBench, an iterative, diagnostics-enabled benchmark for strategy-to-code trading systems, highlighting the role of LLM iteration alongside human oversight.

Findings

01

Top models achieve over 91.7% validity.

02

Iteration induces code convergence across strategies.

03

LLMs excel at rapid prototyping and shallow bug fixes.

Abstract

Large language models (LLMs) are increasingly used as quantitative research copilots to translate natural-language strategy specifications into executable trading code. Yet most existing evaluations either focus on static financial knowledge or summarize performance with a single profitability metric, leaving a gap for benchmarking strategy-to-code trading systems as governed, auditable software. We introduce SysTradeBench (SysTB), an iterative build-test-patch benchmark that evaluates LLM-generated trading systems under drift-aware diagnostics. Given a standardized Base Strategy Doc and frozen semantics, each model must produce (i) a strategy card, (ii) executable code, and (iii) mandatory audit logs. A sandboxed harness runs determinism and anti-leakage checks, detects rule drift across iterations, and returns evidence bundles to support constrained patches. SysTradeBench reports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.