Toward Network-based Keyword Extraction from Multitopic Web Documents
Sabina \v{S}i\v{s}ovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c, Ana, Me\v{s}trovi\'c

TL;DR
This paper explores the use of complex network analysis, specifically selectivity measures, for automatic keyword extraction from multitopic web documents, demonstrating promising results with a novel combined filtering approach.
Contribution
It introduces a network-based method utilizing selectivity measures for keyword extraction and proposes a new approach combining selectivity and weight filtering.
Findings
Selectivity measure outperforms other centrality measures in keyword ranking
The combined filtering approach improves keyword extraction accuracy
Network-based analysis effectively captures keyword relevance in web documents
Abstract
In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction. Texts, collected from different web sources (portals, forums), are represented as directed and weighted co-occurrence complex networks of words. Words are nodes and links are established between two nodes if they are directly co-occurring within the sentence. We test different centrality measures for ranking nodes - keyword candidates. The promising results are achieved using the selectivity measure. Then we propose an approach which enables extracting word pairs according to the values of the in/out selectivity and weight measures combined with filtering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
