Complex Network based Supervised Keyword Extractor

Swagata Duari; Vasudha Bhatnagar

arXiv:1909.12009·cs.IR·September 27, 2019

Complex Network based Supervised Keyword Extractor

Swagata Duari, Vasudha Bhatnagar

PDF

2 Repos

TL;DR

This paper introduces a supervised method for keyword extraction using complex network modeling of text, leveraging node properties to improve accuracy across multiple languages and domains.

Contribution

The paper presents a novel supervised framework that models text as a complex network and exploits node properties for keyword extraction, outperforming recent methods.

Findings

01

The proposed method outperforms several recent keyword extraction techniques.

02

The model performs well across scientific and news corpora.

03

It generalizes effectively to Hindi and Assamese languages.

Abstract

In this paper, we present a supervised framework for automatic keyword extraction from single document. We model the text as complex network, and construct the feature set by extracting select node properties from it. Several node properties have been exploited by unsupervised, graph-based keyword extraction methods to discriminate keywords from non-keywords. We exploit the complex interplay of node properties to design a supervised keyword extraction method. The training set is created from the feature set by assigning a label to each candidate keyword depending on whether the candidate is listed as a gold-standard keyword or not. Since the number of keywords in a document is much less than non-keywords, the curated training set is naturally imbalanced. We train a binary classifier to predict keywords after balancing the training set. The model is trained using two public datasets from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.