Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks
Shibamouli Lahiri, Sagnik Ray Choudhury, Cornelia Caragea

TL;DR
This paper explores various centrality measures on collocation networks for keyword and keyphrase extraction, demonstrating that simpler measures can outperform or match PageRank-based methods across multiple datasets.
Contribution
It introduces the use of alternative centrality measures on collocation networks for keyword extraction, showing their effectiveness compared to PageRank.
Findings
Some centrality measures outperform PageRank in accuracy.
Simpler measures like degree and neighborhood size are effective.
Centrality-based methods are competitive with strong unsupervised baselines.
Abstract
Keyword and keyphrase extraction is an important problem in natural language processing, with applications ranging from summarization to semantic search to document clustering. Graph-based approaches to keyword and keyphrase extraction avoid the problem of acquiring a large in-domain training corpus by applying variants of PageRank algorithm on a network of words. Although graph-based approaches are knowledge-lean and easily adoptable in online systems, it remains largely open whether they can benefit from centrality measures other than PageRank. In this paper, we experiment with an array of centrality measures on word and noun phrase collocation networks, and analyze their performance on four benchmark datasets. Not only are there centrality measures that perform as well as or better than PageRank, but they are much simpler (e.g., degree, strength, and neighborhood size). Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Information Retrieval and Search Behavior
