Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction
Asahi Ushio, Federico Liberatore, Jose Camacho-Collados

TL;DR
This paper provides a comprehensive empirical comparison of statistical and graph-based term weighting schemes for keyword extraction, revealing their strengths, weaknesses, and practical implications.
Contribution
It offers the first large-scale evaluation comparing various term weighting methods, including less-known lexical specificity, for keyword extraction.
Findings
Lexical specificity outperforms tf-idf in certain contexts
Qualitative differences exist between statistical and graph-based methods
Practical recommendations for choosing term weighting schemes are provided
Abstract
Term weighting schemes are widely used in Natural Language Processing and Information Retrieval. In particular, term weighting is the basis for keyword extraction. However, there are relatively few evaluation studies that shed light about the strengths and shortcomings of each weighting scheme. In fact, in most cases researchers and practitioners resort to the well-known tf-idf as default, despite the existence of other suitable alternatives, including graph-based models. In this paper, we perform an exhaustive and large-scale empirical comparison of both statistical and graph-based term weighting methods in the context of keyword extraction. Our analysis reveals some interesting findings such as the advantages of the less-known lexical specificity with respect to tf-idf, or the qualitative differences between statistical and graph-based methods. Finally, based on our findings we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Information Retrieval and Search Behavior
