WikiRank: Improving Keyphrase Extraction Based on Background Knowledge

Yang Yu; Vincent Ng

arXiv:1803.09000·cs.CL·March 28, 2018·20 cites

WikiRank: Improving Keyphrase Extraction Based on Background Knowledge

Yang Yu, Vincent Ng

PDF

Open Access

TL;DR

WikiRank is an unsupervised keyphrase extraction method that leverages Wikipedia as background knowledge, constructing a semantic graph to identify the most relevant keyphrases with improved accuracy.

Contribution

It introduces a novel approach that incorporates background knowledge from Wikipedia into keyphrase extraction via a semantic graph and optimization framework.

Findings

01

Over 2% improvement in F1-score over state-of-the-art models

02

Effective use of Wikipedia as background knowledge

03

Unsupervised method suitable for diverse documents

Abstract

Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other state-of-art models by more than 2% in F1-score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques