FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs
Junzhe Jiang, Chang Yang, Aixin Cui, Sihan Jin, Ruiyu Wang, Bo Li, Xiao Huang, Dongning Sun, Xinrun Wang

TL;DR
FinMaster is a comprehensive benchmark designed to evaluate large language models across full-pipeline financial workflows, highlighting significant gaps in reasoning and accuracy that need addressing for real-world financial applications.
Contribution
This paper introduces FinMaster, the first holistic benchmark covering diverse financial tasks and workflows to systematically assess LLM capabilities in finance.
Findings
LLMs show high accuracy on simple financial tasks but struggle with complex multi-step reasoning.
Accuracy drops from over 90% on basic tasks to around 40% on complex scenarios.
Error propagation significantly impacts multi-metric financial calculations.
Abstract
Financial tasks are pivotal to global economic stability; however, their execution faces challenges including labor intensive processes, low error tolerance, data fragmentation, and tool limitations. Although large language models (LLMs) have succeeded in various natural language processing tasks and have shown potential in automating workflows through reasoning and contextual understanding, current benchmarks for evaluating LLMs in finance lack sufficient domain-specific data, have simplistic task design, and incomplete evaluation frameworks. To address these gaps, this article presents FinMaster, a comprehensive financial benchmark designed to systematically assess the capabilities of LLM in financial literacy, accounting, auditing, and consulting. Specifically, FinMaster comprises three main modules: i) FinSim, which builds simulators that generate synthetic, privacy-compliant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · FinTech, Crowdfunding, Digital Finance · Scientific Computing and Data Management
