Empirical analysis on a keyword-based semantic system
Zike Zhang, Linyuan Lv, Jian-Guo Liu, and Tao Zhou

TL;DR
This paper empirically analyzes the statistical and evolutionary properties of keywords in scientific articles, revealing patterns like Zipf's law and decay trends, which enhance understanding of semantic evolution and aid recommender system design.
Contribution
It provides a comprehensive empirical investigation of keyword dynamics across journals, highlighting universal and subject-specific patterns in semantic evolution.
Findings
Keyword frequency follows Zipf's law with exponent 0.86.
Power-law correlation exists between distinct keywords and total occurrences.
Similar decay trends observed across high-impact journals, differing in low-impact ones.
Abstract
Keywords in scientific articles have found their significance in information filtering and classification. In this article, we empirically investigated statistical characteristics and evolutionary properties of keywords in a very famous journal, namely Proceedings of the National Academy of Science of the United States of America (PNAS), including frequency distribution, temporal scaling behavior, and decay factor. The empirical results indicate that the keyword frequency in PNAS approximately follows a Zipf's law with exponent 0.86. In addition, there is a power-low correlation between the cumulative number of distinct keywords and the cumulative number of keyword occurrences. Extensive empirical analysis on some other journals' data is also presented, with decaying trends of most popular keywords being monitored. Interestingly, top journals from various subjects share very similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
