TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?
Xiaochuang Yuan, Hui Xu, Silvia Xu, Cui Zou, Jing Xiong

TL;DR
TraderBench is a new benchmark combining static expert-verified tasks and adversarial trading simulations to evaluate AI agents' robustness and adaptability in dynamic financial markets.
Contribution
It introduces a comprehensive, variance-free evaluation framework with novel trading tracks and market manipulation scenarios, addressing limitations of prior static benchmarks.
Findings
8 of 13 models scored around 33 on crypto tasks with minimal variation, indicating fixed strategies.
Extended reasoning improves knowledge retrieval significantly but has minimal effect on trading performance.
Current AI agents lack genuine market adaptation, highlighting the need for performance-grounded evaluation.
Abstract
Evaluating AI agents in finance faces two key challenges: static benchmarks require costly expert annotation yet miss the dynamic decision-making central to real-world trading, while LLM-based judges introduce uncontrolled variance on domain-specific tasks. We introduce TraderBench, a benchmark that addresses both issues. It combines expert-verified static tasks (knowledge retrieval, analytical reasoning) with adversarial trading simulations scored purely on realized performance-Sharpe ratio, returns, and drawdown-eliminating judge variance entirely. The framework features two novel tracks: crypto trading with four progressive market-manipulation transforms, and options derivatives scoring across P&L accuracy, Greeks, and risk management. Trading scenarios can be refreshed with new market data to prevent benchmark contamination. Evaluating 13 models (8B open-source to frontier) on ~50…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Markets and Investment Strategies · Blockchain Technology Applications and Security
