Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset
Jie Zhu, Junhui Li, Yalong Wen, Lifan Guo

TL;DR
This paper introduces CFLUE, a comprehensive Chinese financial language understanding benchmark for evaluating large language models across knowledge and application tasks, revealing current model limitations and progress.
Contribution
The paper presents CFLUE, a new benchmark with extensive datasets for Chinese financial NLP tasks, enabling systematic evaluation of LLMs' knowledge and application capabilities.
Findings
GPT-4 surpasses 60% accuracy in knowledge assessment
GPT-4 and GPT-4-turbo outperform lightweight LLMs in application tasks
Current LLMs still have significant room for improvement in financial NLP
Abstract
In light of recent breakthroughs in large language models (LLMs) that have revolutionized natural language processing (NLP), there is an urgent need for new benchmarks to keep pace with the fast development of LLMs. In this paper, we propose CFLUE, the Chinese Financial Language Understanding Evaluation benchmark, designed to assess the capability of LLMs across various dimensions. Specifically, CFLUE provides datasets tailored for both knowledge assessment and application assessment. In knowledge assessment, it consists of 38K+ multiple-choice questions with associated solution explanations. These questions serve dual purposes: answer prediction and question reasoning. In application assessment, CFLUE features 16K+ test instances across distinct groups of NLP tasks such as text classification, machine translation, relation extraction, reading comprehension, and text generation. Upon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStock Market Forecasting Methods
MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout
