CFBenchmark: Chinese Financial Assistant Benchmark for Large Language   Model

Yang Lei; Jiangtong Li; Dawei Cheng; Zhijun Ding; Changjun Jiang

arXiv:2311.05812·cs.CL·May 22, 2024·5 cites

CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model

Yang Lei, Jiangtong Li, Dawei Cheng, Zhijun Ding, Changjun Jiang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

CFBenchmark is a new benchmark designed to evaluate Chinese financial language models across recognition, classification, and generation tasks, revealing current models' strengths and weaknesses in financial text processing.

Contribution

This work introduces CFBenchmark, the first comprehensive Chinese financial assistant benchmark for LLMs, covering multiple tasks and text lengths, with initial experimental results highlighting areas for improvement.

Findings

01

Some LLMs excel in specific tasks

02

Overall performance indicates significant room for improvement

03

Benchmark covers diverse financial text processing tasks

Abstract

Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from three aspects~(\emph{i.e.} recognition, classification, and generation) including eight tasks, and includes financial texts ranging in length from 50 to over 1,800 characters. We conduct experiments on several LLMs available in the literature with CFBenchmark-Basic, and the experimental results indicate that while some LLMs show outstanding performance in specific tasks, overall, there is still significant room for improvement in basic tasks of financial text processing with existing models. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tongjifinlab/cfbenchmark
pytorchOfficial

Datasets

TongjiFinLab/CFBenchmark
dataset· 387 dl
387 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling