MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing, Li, Wei Wang, Xin Cao

TL;DR
MDERank introduces a novel unsupervised keyphrase extraction method that uses masked document embeddings and a specialized BERT model, significantly improving performance on multiple benchmarks.
Contribution
The paper proposes MDERank, a new unsupervised KPE approach utilizing masked embeddings and a contrastively trained KPEBERT model, addressing long document representation issues.
Findings
MDERank outperforms existing unsupervised methods by 1.80 F1@15 on average.
KPEBERT enhances embedding quality for keyphrase extraction.
Overall, MDERank achieves a 3.53 F1@15 improvement over SOTA methods.
Abstract
Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art (SOTA) methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Contrastive Learning · WordPiece · Adam · Dense Connections · Softmax · Dropout · Layer Normalization
