Embrace Divergence for Richer Insights: A Multi-document Summarization   Benchmark and a Case Study on Summarizing Diverse Information from News   Articles

Kung-Hsiang Huang; Philippe Laban; Alexander R. Fabbri; Prafulla Kumar; Choubey; Shafiq Joty; Caiming Xiong; Chien-Sheng Wu

arXiv:2309.09369·cs.CL·March 26, 2024·1 cites

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

Kung-Hsiang Huang, Philippe Laban, Alexander R. Fabbri, Prafulla Kumar, Choubey, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new multi-document summarization task focused on capturing diverse information across news articles about the same event, supported by a curated dataset and analysis of LLM evaluation biases.

Contribution

It presents a novel dataset for diverse multi-document summarization, analyzes LLM evaluation biases, and studies LLM capabilities in summarizing diverse news content.

Findings

01

LLMs cover less than 40% of diverse information on average.

02

Identified biases in LLM-based evaluation metrics.

03

Provided best practices for automatic evaluation of diverse summarization.

Abstract

Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference. Next, to enable consistent automatic evaluation, we conducted a comprehensive analysis to pinpoint the position and verbosity biases when utilizing Large Language Model (LLM)-based metrics for evaluating the coverage and faithfulness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

salesforce/diversesumm
noneOfficial

Videos

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Adam · Layer Normalization