F\`ux\`i: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation
Shangqing Zhao, Yuhao Zhou, Yupei Ren, Zhe Chen, Chenghao Jia, Fang, Zhe, Zhaogaung Long, Shu Liu, Man Lan

TL;DR
F ext`ux ext`i is a comprehensive benchmark designed to evaluate large language models on both understanding and generating ancient Chinese texts, addressing a critical gap in cultural and linguistic assessment.
Contribution
It introduces a balanced, multi-task benchmark with novel evaluation metrics and a systematic framework for assessing ancient Chinese language models.
Findings
Models perform well in comprehension but poorly in generation tasks.
Generation tasks requiring cultural knowledge are particularly challenging.
The benchmark reveals significant gaps in current LLM capabilities for ancient Chinese.
Abstract
Ancient Chinese text processing presents unique challenges for large language models (LLMs) due to its distinct linguistic features, complex structural constraints, and rich cultural context. While existing benchmarks have primarily focused on evaluating comprehension through multiple-choice questions, there remains a critical gap in assessing models' generative capabilities in classical Chinese. We introduce F\`ux\`i, a comprehensive benchmark that evaluates both understanding and generation capabilities across 21 diverse tasks. Our benchmark distinguishes itself through three key contributions: (1) balanced coverage of both comprehension and generation tasks, including novel tasks like poetry composition and couplet completion, (2) specialized evaluation metrics designed specifically for classical Chinese text generation, combining rule-based verification with fine-tuned LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
