Towards Enhancing Coherence in Extractive Summarization: Dataset and   Experiments with LLMs

Mihir Parmar; Hanieh Deilamsalehy; Franck Dernoncourt; Seunghyun Yoon,; Ryan A. Rossi; Trung Bui

arXiv:2407.04855·cs.CL·July 9, 2024

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

Mihir Parmar, Hanieh Deilamsalehy, Franck Dernoncourt, Seunghyun Yoon,, Ryan A. Rossi, Trung Bui

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new human-annotated dataset incorporating user feedback to improve the coherence of extractive summaries generated by large language models, demonstrating significant performance gains.

Contribution

It presents a novel dataset with human-annotated coherence and user intent, and fine-tunes LLMs to enhance extractive summary coherence using this data.

Findings

01

Significant (~10%) Rouge-L improvement in coherence.

02

Fine-tuning LLMs with human feedback enhances summary quality.

03

Benchmarking with instruction-tuned models reveals key insights.

Abstract

Extractive summarization plays a pivotal role in natural language processing due to its wide-range applications in summarizing diverse content efficiently, while also being faithful to the original content. Despite significant advancement achieved in extractive summarization by Large Language Models (LLMs), these summaries frequently exhibit incoherence. An important aspect of the coherent summary is its readability for intended users. Although there have been many datasets and benchmarks proposed for creating coherent extractive summaries, none of them currently incorporate user intent to improve coherence in extractive summarization. Motivated by this, we propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback, offering valuable insights into how to improve coherence in extractive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mihir3009/extract-ai
pytorchOfficial

Videos

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management

MethodsFlan-T5