TL;DR
This paper introduces StoryAlign, a benchmark and reward model for evaluating and improving story generation aligned with human preferences, demonstrating significant advancements in modeling subjective narrative quality.
Contribution
It presents the first benchmark for reward model evaluation on stories and develops a new reward model that outperforms larger models in aligning with human preferences.
Findings
Reward models struggle to match human story preferences, with the best achieving only 66.3% accuracy.
Constructed a large dataset of 100,000 story preference pairs across diverse domains.
StoryReward outperforms larger models and improves story selection aligned with human preferences.
Abstract
Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of human story preferences and introduce StoryRMB, the first benchmark for assessing reward models on story preferences. StoryRMB contains high-quality, human-verified instances, each consisting of a prompt, one chosen story, and three rejected stories. We find existing reward models struggle to select human-preferred stories, with the best model achieving only accuracy. To address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
