CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text

Jun Hirako; Ryohei Sasano; Koichi Takeda

arXiv:2410.04404·cs.CL·October 8, 2024

CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text

Jun Hirako, Ryohei Sasano, Koichi Takeda

PDF

Open Access

TL;DR

This paper introduces CiMaTe, a BERT-based model that effectively predicts future citation counts by leveraging the main text's sectional structure, outperforming previous methods in multiple domains.

Contribution

The paper presents a novel BERT-based model that explicitly captures sectional structure of main text for improved citation count prediction.

Findings

01

Outperforms previous methods in Spearman's rank correlation coefficient

02

Achieves 5.1 point improvement in computational linguistics domain

03

Achieves 1.8 point improvement in biology domain

Abstract

Prediction of the future citation counts of papers is increasingly important to find interesting papers among an ever-growing number of papers. Although a paper's main text is an important factor for citation count prediction, it is difficult to handle in machine learning models because the main text is typically very long; thus previous studies have not fully explored how to leverage it. In this paper, we propose a BERT-based citation count prediction model, called CiMaTe, that leverages the main text by explicitly capturing a paper's sectional structure. Through experiments with papers from computational linguistics and biology domains, we demonstrate the CiMaTe's effectiveness, outperforming the previous methods in Spearman's rank correlation coefficient; 5.1 points in the computational linguistics domain and 1.8 points in the biology domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Data Quality and Management