Topic Aware Contextualized Embeddings for High Quality Phrase Extraction

Venktesh V; Mukesh Mohania; and Vikram Goyal

arXiv:2201.10982·cs.IR·January 27, 2022

Topic Aware Contextualized Embeddings for High Quality Phrase Extraction

Venktesh V, Mukesh Mohania, and Vikram Goyal

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised graph-based method that combines contextualized embeddings and topic vectors to extract high-quality keyphrases, outperforming existing methods and enabling concept expansion with external knowledge.

Contribution

It proposes a novel unsupervised ranking approach using embeddings and topic information for improved phrase extraction and concept expansion.

Findings

01

Outperforms existing unsupervised methods on scientific datasets.

02

Achieves higher F1 scores, e.g., 0.2819 on SemEval2017.

03

Enables extraction of additional keyphrases from external sources like Wikipedia.

Abstract

Keyphrase extraction from a given document is the task of automatically extracting salient phrases that best describe the document. This paper proposes a novel unsupervised graph-based ranking method to extract high-quality phrases from a given document. We obtain the contextualized embeddings from pre-trained language models enriched with topic vectors from Latent Dirichlet Allocation (LDA) to represent the candidate phrases and the document. We introduce a scoring mechanism for the phrases using the information obtained from contextualized embeddings and the topic vectors. The salient phrases are extracted using a ranking algorithm on an undirected graph constructed for the given document. In the undirected graph, the nodes represent the phrases, and the edges between the phrases represent the semantic relatedness between them, weighted by a score obtained from the scoring mechanism.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

venkteshv/unsupervised_keyphrase_extraction_cotagrank_ecir_2022
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques