OASum: Large-Scale Open Domain Aspect-based Summarization
Xianjun Yang, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Xiaoman Pan,, Linda Petzold, Dong Yu

TL;DR
This paper introduces OASum, a large-scale open-domain aspect-based summarization dataset with over 3.7 million instances, enabling improved aspect-focused summarization and downstream task performance.
Contribution
The creation of a high-quality, large-scale open-domain dataset for aspect-based summarization and demonstration of its effectiveness through benchmark and transfer learning experiments.
Findings
Pre-trained models on OASum outperform backbone models in aspect-focused generation.
Zero-shot and few-shot learning on downstream datasets show strong performance.
OASum dataset and checkpoints are publicly available for research.
Abstract
Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OASum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration
