PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream
Susik Yoon, Hou Pong Chan, Jiawei Han

TL;DR
PDSum introduces a prototype-driven, unsupervised approach for continuously summarizing evolving multi-document streams, effectively capturing relevant and novel information over time.
Contribution
This work defines the EMDS problem and proposes PDSum, a novel algorithm that maintains lightweight prototypes to adapt to document changes while preserving previous knowledge.
Findings
PDSum outperforms existing algorithms in relevance, novelty, and distinctiveness.
It is robust across various evaluation settings.
PDSum efficiently updates summaries in real-time streams.
Abstract
Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set. With the rapid development of online platforms for generating and distributing text-rich documents, there arises an urgent need for continuously summarizing dynamically evolving multi-document sets where the composition of documents and sets is changing over time. This is especially challenging as the summarization should be not only effective in incorporating relevant, novel, and distinctive information from each concurrent multi-document set, but also efficient in serving online applications. In this work, we propose a new summarization problem, Evolving Multi-Document sets stream Summarization (EMDS), and introduce a novel unsupervised algorithm PDSum with the idea of prototype-driven continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
