SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation   Count Prediction

Thomas van Dongen; Gideon Maillette de Buy Wenniger; Lambert Schomaker

arXiv:2012.11740·cs.CL·December 23, 2020

SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction

Thomas van Dongen, Gideon Maillette de Buy Wenniger, Lambert Schomaker

PDF

TL;DR

This paper introduces SChuBERT, a new BERT-based model that leverages large scholarly document datasets with longer inputs to significantly improve citation count prediction accuracy.

Contribution

The paper presents SChuBERT, a novel model that outperforms existing methods by utilizing extensive training data and longer text inputs for citation prediction.

Findings

01

SChuBERT outperforms state-of-the-art models in citation prediction.

02

Using larger datasets improves prediction accuracy.

03

Longer input texts enhance model performance.

Abstract

Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.