CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text
Jun Hirako, Ryohei Sasano, Koichi Takeda

TL;DR
This paper introduces CiMaTe, a BERT-based model that effectively predicts future citation counts by leveraging the main text's sectional structure, outperforming previous methods in multiple domains.
Contribution
The paper presents a novel BERT-based model that explicitly captures sectional structure of main text for improved citation count prediction.
Findings
Outperforms previous methods in Spearman's rank correlation coefficient
Achieves 5.1 point improvement in computational linguistics domain
Achieves 1.8 point improvement in biology domain
Abstract
Prediction of the future citation counts of papers is increasingly important to find interesting papers among an ever-growing number of papers. Although a paper's main text is an important factor for citation count prediction, it is difficult to handle in machine learning models because the main text is typically very long; thus previous studies have not fully explored how to leverage it. In this paper, we propose a BERT-based citation count prediction model, called CiMaTe, that leverages the main text by explicitly capturing a paper's sectional structure. Through experiments with papers from computational linguistics and biology domains, we demonstrate the CiMaTe's effectiveness, outperforming the previous methods in Spearman's rank correlation coefficient; 5.1 points in the computational linguistics domain and 1.8 points in the biology domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Data Quality and Management
