StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning
Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama

TL;DR
StoryER introduces a comprehensive automatic story evaluation framework that mimics human preferences through ranking, rating, and reasoning, supported by a large annotated dataset and a fine-tuned Longformer-Encoder-Decoder model.
Contribution
The paper presents a novel multi-task story evaluation method that aligns more closely with human judgment, along with a large annotated dataset and benchmark results.
Findings
High correlation with human preferences in evaluation tasks
Joint learning improves performance across tasks
Provides publicly available dataset and models for research
Abstract
Existing automatic story evaluation methods place a premium on story lexical level coherence, deviating from human preference. We go beyond this limitation by considering a novel \textbf{Story} \textbf{E}valuation method that mimics human preference when judging a story, namely \textbf{StoryER}, which consists of three sub-tasks: \textbf{R}anking, \textbf{R}ating and \textbf{R}easoning. Given either a machine-generated or a human-written story, StoryER requires the machine to output 1) a preference score that corresponds to human preference, 2) specific ratings and their corresponding confidences and 3) comments for various aspects (e.g., opening, character-shaping). To support these tasks, we introduce a well-annotated dataset comprising (i) 100k ranked story pairs; and (ii) a set of 46k ratings and comments on various aspects of the story. We finetune Longformer-Encoder-Decoder (LED)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
