FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models
Spencer Mateega, Carlos Georgescu, Danny Tang

TL;DR
FinanceQA introduces a comprehensive benchmark to evaluate large language models' ability to perform complex financial analysis tasks, revealing significant gaps in current models' accuracy and highlighting the need for better training data.
Contribution
The paper presents a new benchmark suite for assessing LLMs on realistic financial analysis tasks, exposing current limitations and proposing data improvements.
Findings
Current LLMs fail about 60% of financial tasks
Higher-quality training data improves model performance
Existing models struggle with multi-step financial reasoning
Abstract
FinanceQA is a testing suite that evaluates LLMs' performance on complex numerical financial analysis tasks that mirror real-world investment work. Despite recent advances, current LLMs fail to meet the strict accuracy requirements of financial institutions, with models failing approximately 60% of realistic tasks that mimic on-the-job analyses at hedge funds, private equity firms, investment banks, and other financial institutions. The primary challenges include hand-spreading metrics, adhering to standard accounting and corporate valuation conventions, and performing analysis under incomplete information - particularly in multi-step tasks requiring assumption generation. This performance gap highlights the disconnect between existing LLM capabilities and the demands of professional financial analysis that are inadequately tested by current testing architectures. Results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Mathematics, Computing, and Information Processing · Scientific Computing and Data Management
