UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Yuzhe Yang, Yifei Zhang, Yan Hu, Yilin Guo, Ruoli Gan, Yueru He,, Mingcong Lei, Xiao Zhang, Haining Wang, Qianqian Xie, Jimin Huang, Honghai, Yu, and Benyou Wang

TL;DR
The paper presents UCFE, a comprehensive benchmark for evaluating large language models' ability to perform complex financial tasks, combining human feedback and dynamic interactions to better reflect real-world financial scenarios.
Contribution
Introduces a novel user-centric financial expertise benchmark that integrates human evaluations and dynamic interactions for assessing LLMs in finance.
Findings
High correlation (0.78) between benchmark scores and human preferences.
Benchmark effectively evaluates LLMs' performance in complex financial tasks.
Provides a framework for ongoing assessment of LLMs in real-world financial applications.
Abstract
This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly, we conducted a user study involving 804 participants, collecting their feedback on financial tasks. Secondly, based on this feedback, we created our dataset that encompasses a wide range of user intents and interactions. This dataset serves as the foundation for benchmarking 11 LLMs services using the LLM-as-Judge methodology. Our results show a significant alignment between benchmark scores and human preferences, with a Pearson correlation coefficient of 0.78, confirming the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
