Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries
Vishakh Padmakumar, Zichao Wang, David Arbour, Jennifer Healey

TL;DR
This paper introduces a three-step, principled content selection approach using determinantal point processes to improve diversity and personalization in multi-document summarization with large language models.
Contribution
It proposes a novel multi-step method combining explicit content selection with LLM prompting, enhancing coverage and personalization in multi-document summaries.
Findings
Improved source coverage on DiverseSumm benchmark
Enhanced diversity and relevance in summaries
Effective personalization by incorporating user intent
Abstract
While large language models (LLMs) are increasingly capable of handling longer contexts, recent work has demonstrated that they exhibit the "lost in the middle" phenomenon (Liu et al., 2024) of unevenly attending to different parts of the provided context. This hinders their ability to cover diverse source material in multi-document summarization, as noted in the DiverseSumm benchmark (Huang et al., 2024). In this work, we contend that principled content selection is a simple way to increase source coverage on this task. As opposed to prompting an LLM to perform the summarization in a single step, we explicitly divide the task into three steps -- (1) reducing document collections to atomic key points, (2) using determinantal point processes (DPP) to perform select key points that prioritize diverse content, and (3) rewriting to the final summary. By combining prompting steps, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Biomedical Text Mining and Ontologies
