LongSumEval: Question-Answering Based Evaluation and Feedback-Driven Refinement for Long Document Summarization

Huyen Nguyen; Haoxuan Zhang; Yang Zhang; Haihua Chen; Junhua Ding

arXiv:2604.25130·cs.CL·April 29, 2026

LongSumEval: Question-Answering Based Evaluation and Feedback-Driven Refinement for Long Document Summarization

Huyen Nguyen, Haoxuan Zhang, Yang Zhang, Haihua Chen, Junhua Ding

PDF

1 Repo

TL;DR

LongSumEval introduces a question-answering based framework for evaluating and improving long document summaries, providing interpretable scores and actionable feedback aligned with human judgments.

Contribution

It presents a novel QA-based evaluation method that correlates better with human judgments and enables self-refinement without retraining.

Findings

01

QA-based evaluation outperforms existing metrics in agreement with human judgments.

02

Structured feedback facilitates significant quality improvements through self-refinement.

03

Evaluation feedback can be used as executable instructions to guide generation.

Abstract

Evaluating long document summaries remains the primary bottleneck in summarization research. Existing metrics correlate weakly with human judgments and produce aggregate scores without explaining deficiencies or guiding improvement, preventing effective refinement in applications requiring verifiable accuracy. We introduce LongSumEval, a unified framework bridging evaluation and generation through structured question-answering feedback. The framework operationalizes summary quality as answerability and factual alignment of question-answer pairs, generating interpretable scores and actionable feedback that identifies coverage gaps and factual inconsistencies. This resolves the misalignment where evaluation operates independently of generation objectives. Meta-evaluation of our QA-based evaluation module across seven benchmarks demonstrates substantially stronger agreement with human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.