Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods
Sepideh Fahimifar, Khadijeh Mousavi, Fatemeh Mozaffari, and Marcel, Ausloos

TL;DR
This study uses machine learning feature selection methods to identify external factors influencing citation counts of highly cited papers, highlighting key features like international researcher citations and journal self-citations.
Contribution
It introduces the use of Ridge, Lasso, and Boruta algorithms for feature selection in citation analysis, identifying the most relevant external factors affecting highly cited papers.
Findings
International researcher citations are most influential.
Journal and author self-citations are key features.
Open-access status and author's scientific age also matter.
Abstract
Highly cited papers are influenced by external factors that are not directly related to the document's intrinsic quality. In this study, 50 characteristics for measuring the performance of 68 highly cited papers, from the Journal of the American Medical Informatics Association indexed in Web of Sciences (WoS), from 2009 to 2019 were investigated. In the first step, a Pearson correlation analysis is performed to eliminate variables with zero or weak correlation with the target (dependent) variable ([number of citations in WOS]). Consequently, 32 variables are selected for the next step. By applying the Ridge technique, 13 features show a positive effect on the number of citations. Using three different algorithms, i.e., Ridge, Lasso, and Boruta, 6 factors appear to be the most relevant ones. The [Number of citations by international researchers], [Journal self-citations in citing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
