An Ensemble Approach for Research Article Identification: a Case Study in Artificial Intelligence
Lie Tang, Xianke Zhou, Min Lu

TL;DR
This paper introduces an ensemble method combining decision trees, sciBERT, regex, and SVM to accurately identify AI research articles, outperforming existing search techniques and revealing insights into research trends.
Contribution
The study presents a novel ensemble approach for research article identification that improves accuracy and provides research trend insights in AI.
Findings
Captured 97% of AI articles with 0.92 precision
Increased F1 score by 0.15 over existing methods
Identified more articles in subfields like feature extraction
Abstract
This study presents an ensemble approach that addresses the challenges of identification and analysis of research articles in rapidly evolving fields, using the field of Artificial Intelligence (AI) as a case study. Our approach included using decision tree, sciBERT and regular expression matching on different fields of the articles, and a SVM to merge the results from different models. We evaluated the effectiveness of our method on a manually labeled dataset, finding that our combined approach captured around 97% of AI-related articles in the web of science (WoS) corpus with a precision of 0.92. This presents a 0.15 increase in F1 score compared with existing search term based approach. Following this, we analyzed the publication volume trends and common research themes.We found that compared with existing methods, our ensemble approach revealed an increased degree of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods
