EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation

Xinda Wang; Zhengxu Hou; Yangshijie Zhang; Bingren Yan; Jialin Liu; Chenzhuo Zhao; Zhibo Yang; Bin-Bin Yang; Feng Xiao

arXiv:2508.06046·cs.CL·March 17, 2026

EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation

Xinda Wang, Zhengxu Hou, Yangshijie Zhang, Bingren Yan, Jialin Liu, Chenzhuo Zhao, Zhibo Yang, Bin-Bin Yang, Feng Xiao

PDF

Open Access

TL;DR

EvolvR introduces a self-evolving pairwise reasoning framework that improves story evaluation by generating and filtering high-quality reasoning data, leading to state-of-the-art performance and better story generation quality.

Contribution

The paper presents a novel self-evolving framework that synthesizes and filters reasoning data for improved story evaluation and generation, addressing limitations of existing methods.

Findings

01

Achieves state-of-the-art results on StoryER, HANNA, and OpenMEVA benchmarks.

02

Effectively enhances story quality when used as a reward model.

03

Demonstrates robustness and logical rigor through multi-agent data filtering.

Abstract

Although the effectiveness of Large Language Models (LLMs) as judges (LLM-as-a-judge) has been validated, their performance remains limited in open-ended tasks, particularly in story evaluation. Accurate story evaluation is crucial not only for assisting human quality judgment but also for providing key signals to guide story generation. However, existing methods face a dilemma: prompt engineering for closed-source models suffers from poor adaptability, while fine-tuning approaches for open-source models lack the rigorous reasoning capabilities essential for story evaluation. To address this, we propose the Self-Evolving Pairwise Reasoning (EvolvR) framework. Grounded in pairwise comparison, the framework first self-synthesizes score-aligned Chain-of-Thought (CoT) data via a multi-persona strategy. To ensure data quality, these raw CoTs undergo a self-filtering process, utilizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Human Motion and Animation