Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation
Zonghan Wu, Congyuan Zou, Junlin Wang, Chenhan Wang, Hangjing Yang, Yilei Shao

TL;DR
This paper introduces FinAR-Bench, a new benchmark dataset for evaluating large language models on financial statement analysis, focusing on real-world tasks like report generation and logical reasoning in finance.
Contribution
The paper presents a structured benchmark dataset, FinAR-Bench, that evaluates LLMs on key steps of financial analysis, addressing limitations of existing financial benchmarks.
Findings
LLMs show strengths in extracting financial information.
Performance varies across different analysis steps.
Benchmark reveals current limitations of LLMs in financial reasoning.
Abstract
Generative AI, particularly large language models (LLMs), is beginning to transform the financial industry by automating tasks and helping to make sense of complex financial information. One especially promising use case is the automatic creation of fundamental analysis reports, which are essential for making informed investment decisions, evaluating credit risks, guiding corporate mergers, etc. While LLMs attempt to generate these reports from a single prompt, the risks of inaccuracy are significant. Poor analysis can lead to misguided investments, regulatory issues, and loss of trust. Existing financial benchmarks mainly evaluate how well LLMs answer financial questions but do not reflect performance in real-world tasks like generating financial analysis reports. In this paper, we propose FinAR-Bench, a solid benchmark dataset focusing on financial statement analysis, a core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
