CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
Yang Lei, Jiangtong Li, Dawei Cheng, Zhijun Ding, Changjun Jiang

TL;DR
CFBenchmark is a new benchmark designed to evaluate Chinese financial language models across recognition, classification, and generation tasks, revealing current models' strengths and weaknesses in financial text processing.
Contribution
This work introduces CFBenchmark, the first comprehensive Chinese financial assistant benchmark for LLMs, covering multiple tasks and text lengths, with initial experimental results highlighting areas for improvement.
Findings
Some LLMs excel in specific tasks
Overall performance indicates significant room for improvement
Benchmark covers diverse financial text processing tasks
Abstract
Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from three aspects~(\emph{i.e.} recognition, classification, and generation) including eight tasks, and includes financial texts ranging in length from 50 to over 1,800 characters. We conduct experiments on several LLMs available in the literature with CFBenchmark-Basic, and the experimental results indicate that while some LLMs show outstanding performance in specific tasks, overall, there is still significant room for improvement in basic tasks of financial text processing with existing models. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
