Learning to Summarize from LLM-generated Feedback

Hwanjun Song; Taewon Yun; Yuho Lee; Jihwan Oh; Gihun Lee; and Jason Cai; Hang Su

arXiv:2410.13116·cs.CL·January 28, 2025

Learning to Summarize from LLM-generated Feedback

Hwanjun Song, Taewon Yun, Yuho Lee, Jihwan Oh, Gihun Lee, and Jason Cai, Hang Su

PDF

Open Access 10 Models 1 Datasets 1 Video

TL;DR

This paper presents FeedSum, a large-scale dataset of LLM-generated feedback, and demonstrates how leveraging high-quality, multi-dimensional feedback can significantly enhance the performance of smaller models in generating human-preferred summaries.

Contribution

The work introduces FeedSum dataset, compares feedback utilization methods, and develops SummLlama3-8b, a smaller model outperforming larger counterparts through feedback-based training.

Findings

01

High-quality, multi-dimensional feedback improves summary quality.

02

Supervised fine-tuning and preference optimization are effective methods.

03

Smaller models like SummLlama3-8b outperform larger models with feedback.

Abstract

Developing effective text summarizers remains a challenge due to issues like hallucinations, key information omissions, and verbosity in LLM-generated summaries. This work explores using LLM-generated feedback to improve summary quality by aligning the summaries with human preferences for faithfulness, completeness, and conciseness. We introduce FeedSum, a large-scale dataset containing multi-dimensional LLM feedback on summaries of varying quality across diverse domains. Our experiments show how feedback quality, dimensionality, and granularity influence preference learning, revealing that high-quality, multi-dimensional, fine-grained feedback significantly improves summary generation. We also compare two methods for using this feedback: supervised fine-tuning and direct preference optimization. Finally, we introduce SummLlama3-8b, a model that outperforms the nearly 10x larger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

DISLab/FeedSum
dataset· 110 dl
110 dl

Videos

Learning to Summarize from LLM-generated Feedback· underline

Taxonomy

TopicsAdvanced Computational Techniques and Applications · Natural Language Processing Techniques · Statistical and Computational Modeling