SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction
Thomas van Dongen, Gideon Maillette de Buy Wenniger, Lambert Schomaker

TL;DR
This paper introduces SChuBERT, a new BERT-based model that leverages large scholarly document datasets with longer inputs to significantly improve citation count prediction accuracy.
Contribution
The paper presents SChuBERT, a novel model that outperforms existing methods by utilizing extensive training data and longer text inputs for citation prediction.
Findings
SChuBERT outperforms state-of-the-art models in citation prediction.
Using larger datasets improves prediction accuracy.
Longer input texts enhance model performance.
Abstract
Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
