LFED: A Literary Fiction Evaluation Dataset for Large Language Models

Linhao Yu; Qun Liu; Deyi Xiong

arXiv:2405.10166·cs.CL·May 17, 2024

LFED: A Literary Fiction Evaluation Dataset for Large Language Models

Linhao Yu, Qun Liu, Deyi Xiong

PDF

Open Access 1 Repo

TL;DR

LFED introduces a new dataset for evaluating large language models' understanding of Chinese literary fiction, revealing current models' limited performance and providing insights into factors affecting comprehension.

Contribution

This paper presents LFED, the first comprehensive Chinese literary fiction dataset with a detailed question taxonomy for evaluating LLMs' comprehension and reasoning capabilities.

Findings

01

LLMs struggle with literary fiction questions, with ChatGPT scoring only 57.08% in zero-shot.

02

Attributes like novel type and publication year significantly influence LLM performance.

03

The dataset enables systematic evaluation of LLMs' literary understanding.

Abstract

The rapid evolution of large language models (LLMs) has ushered in the need for comprehensive assessments of their performance across various dimensions. In this paper, we propose LFED, a Literary Fiction Evaluation Dataset, which aims to evaluate the capability of LLMs on the long fiction comprehension and reasoning. We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries. We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions. Additionally, we conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations. Through a series of experiments with various state-of-the-art LLMs, we demonstrate that these models face…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tjunlp-lab/lfed
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques