C$^{3}$Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models
Jiahuan Cao, Yongxin Shi, Dezhi Peng, Yang Liu, Lianwen Jin

TL;DR
This paper introduces C$^{3}$bench, a comprehensive benchmark with 50,000 classical Chinese text pairs across five tasks, to evaluate and advance the capabilities of large language models in classical Chinese understanding.
Contribution
It provides the first extensive benchmark for classical Chinese understanding, enabling systematic evaluation of LLMs and highlighting their current limitations.
Findings
LLMs still underperform compared to supervised models
Classical Chinese understanding requires specialized attention
Benchmark facilitates future research and development
Abstract
Classical Chinese Understanding (CCU) holds significant value in preserving and exploration of the outstanding traditional Chinese culture. Recently, researchers have attempted to leverage the potential of Large Language Models (LLMs) for CCU by capitalizing on their remarkable comprehension and semantic capabilities. However, no comprehensive benchmark is available to assess the CCU capabilities of LLMs. To fill this gap, this paper introduces Cbench, a Comprehensive Classical Chinese understanding benchmark, which comprises 50,000 text pairs for five primary CCU tasks, including classification, retrieval, named entity recognition, punctuation, and translation. Furthermore, the data in Cbench originates from ten different domains, covering most of the categories in classical Chinese. Leveraging the proposed Cbench, we extensively evaluate the quantitative performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
