Unsupervised extraction of local and global keywords from a single text
Lida Aleksanyan, Armen E. Allahverdyan

TL;DR
This paper introduces an unsupervised, language-independent method for extracting local and global keywords from individual texts, improving effectiveness especially for long texts and uncovering thematic content.
Contribution
It presents a novel keyword extraction technique based on spatial word distribution and permutation response, capable of identifying local, global, and thematic keywords without supervision.
Findings
More effective for long texts compared to existing methods
Able to infer local and global keywords and basic themes
Supported by human annotation and linguistic analysis
Abstract
We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. As compared to existing methods (such as e.g. YAKE) our method has three advantages. First, it is significantly more effective at extracting keywords from long texts. Second, it allows inference of two types of keywords: local and global. Third, it uncovers basic themes in texts. Additionally, our method is language-independent and applies to short texts. The results are obtained via human annotators with previous knowledge of texts from our database of classical literary works (the agreement between annotators is from moderate to substantial). Our results are supported via human-independent arguments based on the average length of extracted content words and on the average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Information Retrieval and Search Behavior · Digital Humanities and Scholarship
