MDERank: A Masked Document Embedding Rank Approach for Unsupervised   Keyphrase Extraction

Linhan Zhang; Qian Chen; Wen Wang; Chong Deng; Shiliang Zhang; Bing; Li; Wei Wang; Xin Cao

arXiv:2110.06651·cs.CL·March 1, 2023·6 cites

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing, Li, Wei Wang, Xin Cao

PDF

Open Access 1 Repo

TL;DR

MDERank introduces a novel unsupervised keyphrase extraction method that uses masked document embeddings and a specialized BERT model, significantly improving performance on multiple benchmarks.

Contribution

The paper proposes MDERank, a new unsupervised KPE approach utilizing masked embeddings and a contrastively trained KPEBERT model, addressing long document representation issues.

Findings

01

MDERank outperforms existing unsupervised methods by 1.80 F1@15 on average.

02

KPEBERT enhances embedding quality for keyphrase extraction.

03

Overall, MDERank achieves a 3.53 F1@15 improvement over SOTA methods.

Abstract

Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art (SOTA) methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linhanz/mderank
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Contrastive Learning · WordPiece · Adam · Dense Connections · Softmax · Dropout · Layer Normalization