BizFinBench.v2: A Unified Dual-Mode Bilingual Benchmark for Expert-Level Financial Capability Alignment
Xin Guo, Rongjunchen Zhang, Guilong Lu, Xuntao Guo, Shuai Jia, Zhi Yang, Liwen Zhang

TL;DR
BizFinBench.v2 is a comprehensive, real-world benchmark for evaluating large language models' financial capabilities using authentic Chinese and U.S. market data, addressing previous limitations of simulated and static assessments.
Contribution
Introduces BizFinBench.v2, the first authentic, dual-mode bilingual benchmark with online assessment for expert-level financial tasks, covering 29,578 Q&A pairs across core business scenarios.
Findings
ChatGPT-5 achieves 61.5% accuracy on main tasks.
DeepSeek-R1 outperforms other commercial LLMs in online tasks.
Error analysis reveals specific capability gaps in current models.
Abstract
Large language models have undergone rapid evolution, emerging as a pivotal technology for intelligence in financial operations. However, existing benchmarks are often constrained by pitfalls such as reliance on simulated or general-purpose samples and a focus on singular, offline static scenarios. Consequently, they fail to align with the requirements for authenticity and real-time responsiveness in financial services, leading to a significant discrepancy between benchmark performance and actual operational efficacy. To address this, we introduce BizFinBench.v2, the first large-scale evaluation benchmark grounded in authentic business data from both Chinese and U.S. equity markets, integrating online assessment. We performed clustering analysis on authentic user queries from financial platforms, resulting in eight fundamental tasks and two online tasks across four core business…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
