Toward Network-based Keyword Extraction from Multitopic Web Documents

Sabina \v{S}i\v{s}ovi\'c; Sanda Martin\v{c}i\'c-Ip\v{s}i\'c; Ana; Me\v{s}trovi\'c

arXiv:1407.3636·cs.CL·July 15, 2014·5 cites

Toward Network-based Keyword Extraction from Multitopic Web Documents

Sabina \v{S}i\v{s}ovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c, Ana, Me\v{s}trovi\'c

PDF

Open Access

TL;DR

This paper explores the use of complex network analysis, specifically selectivity measures, for automatic keyword extraction from multitopic web documents, demonstrating promising results with a novel combined filtering approach.

Contribution

It introduces a network-based method utilizing selectivity measures for keyword extraction and proposes a new approach combining selectivity and weight filtering.

Findings

01

Selectivity measure outperforms other centrality measures in keyword ranking

02

The combined filtering approach improves keyword extraction accuracy

03

Network-based analysis effectively captures keyword relevance in web documents

Abstract

In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. We test different centrality measures for ranking nodes - keyword candidates. The promising results are achieved using the selectivity measure. Then we propose an approach which enables extracting word pairs according to the values of the in/out selectivity and weight measures combined with filtering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques