Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries

Vishakh Padmakumar; Zichao Wang; David Arbour; Jennifer Healey

arXiv:2505.21859·cs.CL·May 29, 2025

Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries

Vishakh Padmakumar, Zichao Wang, David Arbour, Jennifer Healey

PDF

Open Access 1 Video

TL;DR

This paper introduces a three-step, principled content selection approach using determinantal point processes to improve diversity and personalization in multi-document summarization with large language models.

Contribution

It proposes a novel multi-step method combining explicit content selection with LLM prompting, enhancing coverage and personalization in multi-document summaries.

Findings

01

Improved source coverage on DiverseSumm benchmark

02

Enhanced diversity and relevance in summaries

03

Effective personalization by incorporating user intent

Abstract

While large language models (LLMs) are increasingly capable of handling longer contexts, recent work has demonstrated that they exhibit the "lost in the middle" phenomenon (Liu et al., 2024) of unevenly attending to different parts of the provided context. This hinders their ability to cover diverse source material in multi-document summarization, as noted in the DiverseSumm benchmark (Huang et al., 2024). In this work, we contend that principled content selection is a simple way to increase source coverage on this task. As opposed to prompting an LLM to perform the summarization in a single step, we explicitly divide the task into three steps -- (1) reducing document collections to atomic key points, (2) using determinantal point processes (DPP) to perform select key points that prioritize diverse content, and (3) rewriting to the final summary. By combining prompting steps, for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries· underline

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Biomedical Text Mining and Ontologies