AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization
Sayali Kulkarni, Sheide Chammas, Wan Zhu, Fei Sha, Eugene Ie

TL;DR
AQuaMuSe is a scalable method for automatically generating large query-based multi-document summarization datasets from question answering data and web corpora, enabling improved training and evaluation.
Contribution
It introduces a novel approach to automatically mine datasets for qMDS, supporting both extractive and abstractive summarization, and releases a large-scale dataset for research.
Findings
The dataset contains 5,519 query-based summaries.
Baseline models show promising results on the dataset.
The approach enables scalable dataset creation for qMDS.
Abstract
Summarization is the task of compressing source document(s) into coherent and succinct passages. This is a valuable tool to present users with concise and accurate sketch of the top ranked documents related to their queries. Query-based multi-document summarization (qMDS) addresses this pervasive need, but the research is severely limited due to lack of training and evaluation datasets as existing single-document and multi-document summarization datasets are inadequate in form and scale. We propose a scalable approach called AQuaMuSe to automatically mine qMDS examples from question answering datasets and large document corpora. Our approach is unique in the sense that it can general a dual dataset -- for extractive and abstractive summaries both. We publicly release a specific instance of an AQuaMuSe dataset with 5,519 query-based summaries, each associated with an average of 6 input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
