FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning

Shaoyu Dou; Yutian Shen; Mofan Chen; Zixuan Wang; Jiajie Xu; Qi Guo; Kailai Shao; Chao Chen; Haixiang Hu; Haibo Shi; Min Min; Liwen Zhang

arXiv:2506.21591·cs.CL·November 7, 2025

FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning

Shaoyu Dou, Yutian Shen, Mofan Chen, Zixuan Wang, Jiajie Xu, Qi Guo, Kailai Shao, Chao Chen, Haixiang Hu, Haibo Shi, Min Min, Liwen Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces FinEval-KR, a framework for separately evaluating large language models' financial knowledge and reasoning abilities, along with a new dataset, revealing key factors affecting reasoning accuracy.

Contribution

The paper presents a novel evaluation framework that decouples knowledge and reasoning in LLMs, introduces a cognitive score based on Bloom's taxonomy, and provides a new Chinese financial reasoning dataset.

Findings

01

LLM reasoning and higher-order cognitive abilities are crucial for accuracy.

02

Top models still struggle with knowledge application.

03

Specialized financial LLMs lag behind general models.

Abstract

Large Language Models (LLMs) demonstrate significant potential but face challenges in complex financial reasoning tasks requiring both domain knowledge and sophisticated reasoning. Current evaluation benchmarks often fall short by not decoupling these capabilities indicators from single task performance and lack root cause analysis for task failure. To address this, we introduce FinEval-KR, a novel evaluation framework for decoupling and quantifying LLMs' knowledge and reasoning abilities independently, proposing distinct knowledge score and reasoning score metrics. Inspired by cognitive science, we further propose a cognitive score based on Bloom's taxonomy to analyze capabilities in reasoning tasks across different cognitive levels. We also release a new open-source Chinese financial reasoning dataset covering 22 subfields to support reproducible research and further advancements in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning· underline

Taxonomy

TopicsText Readability and Simplification · Artificial Intelligence in Healthcare and Education · Topic Modeling