Data Augmentation for Abstractive Query-Focused Multi-Document   Summarization

Ramakanth Pasunuru; Asli Celikyilmaz; Michel Galley; Chenyan Xiong,; Yizhe Zhang; Mohit Bansal; Jianfeng Gao

arXiv:2103.01863·cs.CL·March 3, 2021

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley, Chenyan Xiong,, Yizhe Zhang, Mohit Bansal, Jianfeng Gao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces two novel data augmentation methods for query-focused multi-document summarization, creating large datasets that improve neural models' performance and set new state-of-the-art results.

Contribution

The paper presents two new datasets for QMDS created via data augmentation, and introduces hierarchical encoders that enhance model efficiency and effectiveness.

Findings

01

Achieved state-of-the-art transfer results on DUC datasets.

02

Data augmentation and hierarchical encoders outperform baselines.

03

Models perform well on automatic metrics and human evaluations.

Abstract

The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ramakanth-pasunuru/QmdsCnnIr
pytorchOfficial

Videos

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization· underline

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques