QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications
Mingjun Zhao, Shengli Yan, Bang Liu, Xinwang Zhong, Qian Hao, Haolan, Chen, Di Niu, Bowei Long, Weidong Guo

TL;DR
This paper introduces QBSUM, a large-scale Chinese query-based document summarization dataset, along with multiple solutions demonstrating high performance, aiming to advance research in this area.
Contribution
The paper provides the first large-scale Chinese query-based summarization dataset and proposes multiple effective solutions for the task.
Findings
High-speed inference achieved by proposed solutions
Superior performance demonstrated in offline and online tests
QBSUM dataset facilitates future research in Chinese query-based summarization
Abstract
Query-based document summarization aims to extract or generate a summary of a document which directly answers or is relevant to the search query. It is an important technique that can be beneficial to a variety of applications such as search engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based summarization are short in numbers and existing datasets are also limited in both scale and quality. Moreover, to the best of our knowledge, there is no publicly available dataset for Chinese query-based document summarization. In this paper, we present QBSUM, a high-quality large-scale dataset consisting of 49,000+ data samples for the task of Chinese query-based document summarization. We also propose multiple unsupervised and supervised solutions to the task and demonstrate their high-speed inference and superior performance via both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
