Automatic Analysis of Linguistic Features in Journal Articles of Different Academic Impacts with Feature Engineering Techniques
Siyu Lei, Ruiying Yang, Chu-Ren Huang

TL;DR
This study uses feature engineering and machine learning to identify linguistic features that distinguish high-impact from moderate-impact journal articles, focusing on COVID-19 research papers.
Contribution
It introduces a novel application of feature engineering and supervised learning to analyze linguistic features related to academic impact in research articles.
Findings
24 linguistic features predict journal impact with high accuracy
Random forest outperforms other models in classification
Content word overlap and pronoun use are key indicators
Abstract
English research articles (RAs) are an essential genre in academia, so the attempts to employ NLP to assist the development of academic writing ability have received considerable attention in the last two decades. However, there has been no study employing feature engineering techniques to investigate the linguistic features of RAs of different academic impacts (i.e., the papers of high/moderate citation times published in the journals of high/moderate impact factors). This study attempts to extract micro-level linguistic features in high- and moderate-impact journal RAs, using feature engineering methods. We extracted 25 highly relevant features from the Corpus of English Journal Articles through feature selection methods. All papers in the corpus deal with COVID-19 medical empirical studies. The selected features were then validated of the classification performance in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsFeature Selection
