Agent-as-Judge for Factual Summarization of Long Narratives

Yeonseok Jeong; Minsoo Kim; Seung-won Hwang; Byung-Hak Kim

arXiv:2501.09993·cs.CL·October 1, 2025

Agent-as-Judge for Factual Summarization of Long Narratives

Yeonseok Jeong, Minsoo Kim, Seung-won Hwang, Byung-Hak Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces NarrativeFactScore, an agent-based evaluation framework that uses a Character Knowledge Graph to assess and improve the factual accuracy of long narrative summaries generated by LLMs.

Contribution

It presents a novel agent-as-a-judge framework utilizing a Character Knowledge Graph to evaluate and refine summaries for factual consistency in long narratives.

Findings

01

NarrativeFactScore outperforms existing metrics in factual accuracy evaluation.

02

The framework effectively identifies missing or erroneous facts in summaries.

03

Validation on benchmark datasets shows improved factual reliability of summaries.

Abstract

Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-as-a-Judge, address the limitations of metrics based on lexical similarity but still exhibit factual inconsistencies, especially in understanding character relationships and states. In this work, we introduce NarrativeFactScore, a novel "Agent-as-a-Judge" framework for evaluating and refining summaries. By leveraging a Character Knowledge Graph (CKG) extracted from input and generated summaries, NarrativeFactScore assesses the factual consistency and provides actionable guidance for refinement, such as identifying missing or erroneous facts. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yeonseokjeong/narrativefactscore
pytorchOfficial

Videos

Agent-as-Judge for Factual Summarization of Long Narratives· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Law