QBSUM: a Large-Scale Query-Based Document Summarization Dataset from   Real-world Applications

Mingjun Zhao; Shengli Yan; Bang Liu; Xinwang Zhong; Qian Hao; Haolan; Chen; Di Niu; Bowei Long; Weidong Guo

arXiv:2010.14108·cs.AI·October 29, 2020

QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

Mingjun Zhao, Shengli Yan, Bang Liu, Xinwang Zhong, Qian Hao, Haolan, Chen, Di Niu, Bowei Long, Weidong Guo

PDF

TL;DR

This paper introduces QBSUM, a large-scale Chinese query-based document summarization dataset, along with multiple solutions demonstrating high performance, aiming to advance research in this area.

Contribution

The paper provides the first large-scale Chinese query-based summarization dataset and proposes multiple effective solutions for the task.

Findings

01

High-speed inference achieved by proposed solutions

02

Superior performance demonstrated in offline and online tests

03

QBSUM dataset facilitates future research in Chinese query-based summarization

Abstract

Query-based document summarization aims to extract or generate a summary of a document which directly answers or is relevant to the search query. It is an important technique that can be beneficial to a variety of applications such as search engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based summarization are short in numbers and existing datasets are also limited in both scale and quality. Moreover, to the best of our knowledge, there is no publicly available dataset for Chinese query-based document summarization. In this paper, we present QBSUM, a high-quality large-scale dataset consisting of 49,000+ data samples for the task of Chinese query-based document summarization. We also propose multiple unsupervised and supervised solutions to the task and demonstrate their high-speed inference and superior performance via both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.