FinBen: A Holistic Financial Benchmark for Large Language Models
Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang,, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang, Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang,, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang

TL;DR
FinBen is a comprehensive open-source benchmark for evaluating large language models in finance, covering diverse tasks and revealing strengths and weaknesses of models like GPT-4 and Gemini in financial applications.
Contribution
Introduces FinBen, the first extensive financial evaluation benchmark with new datasets, covering a wide range of financial tasks and enabling the first financial LLMs shared task.
Findings
LLMs excel in information extraction and textual analysis.
GPT-4 performs best in stock trading tasks.
Gemini excels in text generation and forecasting.
Abstract
LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFinTech, Crowdfunding, Digital Finance
MethodsLinear Layer · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Multi-Head Attention · Layer Normalization · Residual Connection · Absolute Position Encodings
