Toward Selectivity Based Keyword Extraction for Croatian News
Slobodan Beliga, Ana Me\v{s}trovi\'c, Sanda, Martin\v{c}i\'c-Ip\v{s}i\'c

TL;DR
This paper introduces a novel unsupervised network-based keyword extraction method for Croatian news, utilizing a new measure called node selectivity to identify keywords without linguistic knowledge.
Contribution
The paper proposes a new network measure, node selectivity, for keyword extraction that is purely statistical and structural, avoiding linguistic dependencies.
Findings
F1 score for keywords: 24.63%
F2 score for keywords: 21.19%
F1 score for word-tuples: 25.9%
Abstract
Preliminary report on network based keyword extraction for Croatian is an unsupervised method for keyword extraction from the complex network. We build our approach with a new network measure the node selectivity, motivated by the research of the graph based centrality approaches. The node selectivity is defined as the average weight distribution on the links of the single node. We extract nodes (keyword candidates) based on the selectivity value. Furthermore, we expand extracted nodes to word-tuples ranked with the highest in/out selectivity values. Selectivity based extraction does not require linguistic knowledge while it is purely derived from statistical and structural information en-compassed in the source text which is reflected into the structure of the network. Obtained sets are evaluated on a manually annotated keywords: for the set of extracted keyword candidates average F1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Web Data Mining and Analysis · Service-Oriented Architecture and Web Services
