Benchmarking Large Language Models on CFLUE -- A Chinese Financial   Language Understanding Evaluation Dataset

Jie Zhu; Junhui Li; Yalong Wen; Lifan Guo

arXiv:2405.10542·cs.CL·May 20, 2024

Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset

Jie Zhu, Junhui Li, Yalong Wen, Lifan Guo

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces CFLUE, a comprehensive Chinese financial language understanding benchmark for evaluating large language models across knowledge and application tasks, revealing current model limitations and progress.

Contribution

The paper presents CFLUE, a new benchmark with extensive datasets for Chinese financial NLP tasks, enabling systematic evaluation of LLMs' knowledge and application capabilities.

Findings

01

GPT-4 surpasses 60% accuracy in knowledge assessment

02

GPT-4 and GPT-4-turbo outperform lightweight LLMs in application tasks

03

Current LLMs still have significant room for improvement in financial NLP

Abstract

In light of recent breakthroughs in large language models (LLMs) that have revolutionized natural language processing (NLP), there is an urgent need for new benchmarks to keep pace with the fast development of LLMs. In this paper, we propose CFLUE, the Chinese Financial Language Understanding Evaluation benchmark, designed to assess the capability of LLMs across various dimensions. Specifically, CFLUE provides datasets tailored for both knowledge assessment and application assessment. In knowledge assessment, it consists of 38K+ multiple-choice questions with associated solution explanations. These questions serve dual purposes: answer prediction and question reasoning. In application assessment, CFLUE features 16K+ test instances across distinct groups of NLP tasks such as text classification, machine translation, relation extraction, reading comprehension, and text generation. Upon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Benchmarking Large Language Models on CFLUE - A Chinese Financial Language Understanding Evaluation Dataset· underline

Taxonomy

TopicsStock Market Forecasting Methods

MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout