WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Hiroaki Hayashi; Prashant Budania; Peng Wang; Chris Ackerson; Raj; Neervannan; Graham Neubig

arXiv:2011.07832·cs.CL·November 17, 2020

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Hiroaki Hayashi, Prashant Budania, Peng Wang, Chris Ackerson, Raj, Neervannan, Graham Neubig

PDF

Open Access 1 Repo 1 Datasets

TL;DR

WikiAsp is a large-scale, multi-domain dataset for aspect-based summarization using Wikipedia articles, aiming to advance open-domain summarization research and address challenges like pronoun handling and temporal consistency.

Contribution

The paper introduces WikiAsp, a novel dataset spanning 20 domains for multi-domain aspect-based summarization, and evaluates baseline models to identify key challenges.

Findings

01

Existing models struggle with pronoun resolution in quotations.

02

Temporal consistency remains a significant challenge.

03

Baseline models reveal the complexity of multi-domain summarization.

Abstract

Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neulab/wikiasp
tfOfficial

Datasets

neulab/wiki_asp
dataset· 186 dl
186 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies