LFED: A Literary Fiction Evaluation Dataset for Large Language Models
Linhao Yu, Qun Liu, Deyi Xiong

TL;DR
LFED introduces a new dataset for evaluating large language models' understanding of Chinese literary fiction, revealing current models' limited performance and providing insights into factors affecting comprehension.
Contribution
This paper presents LFED, the first comprehensive Chinese literary fiction dataset with a detailed question taxonomy for evaluating LLMs' comprehension and reasoning capabilities.
Findings
LLMs struggle with literary fiction questions, with ChatGPT scoring only 57.08% in zero-shot.
Attributes like novel type and publication year significantly influence LLM performance.
The dataset enables systematic evaluation of LLMs' literary understanding.
Abstract
The rapid evolution of large language models (LLMs) has ushered in the need for comprehensive assessments of their performance across various dimensions. In this paper, we propose LFED, a Literary Fiction Evaluation Dataset, which aims to evaluate the capability of LLMs on the long fiction comprehension and reasoning. We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries. We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions. Additionally, we conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations. Through a series of experiments with various state-of-the-art LLMs, we demonstrate that these models face…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
