Multi-Document Keyphrase Extraction: Dataset, Baselines and Review
Ori Shapira, Ramakanth Pasunuru, Ido Dagan, Yael Amsterdamer

TL;DR
This paper introduces the first dataset for multi-document keyphrase extraction, benchmarks baseline methods, and reviews existing literature to advance research in this underexplored area.
Contribution
It provides a new dataset, MK-DUC-01, for multi-document keyphrase extraction and evaluates baseline methods to facilitate future research.
Findings
First dataset for multi-document keyphrase extraction created
Baseline methods tested on the new dataset
Literature review of the task included
Abstract
Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To stimulate this pursuit, we present here the first dataset for the task, MK-DUC-01, which can serve as a new benchmark, and test multiple keyphrase extraction baselines on our data. In addition, we provide a brief, yet comprehensive, literature review of the task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsTest
