On the Use of Context for Predicting Citation Worthiness of Sentences in   Scholarly Articles

Rakesh Gosangi; Ravneet Arora; Mohsen Gheisarieha; Debanjan Mahata,; Haimin Zhang

arXiv:2104.08962·cs.CL·April 20, 2021

On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles

Rakesh Gosangi, Ravneet Arora, Mohsen Gheisarieha, Debanjan Mahata,, Haimin Zhang

PDF

Open Access

TL;DR

This paper investigates how context influences the prediction of whether sentences in scholarly articles are worth citing, using a hierarchical BiLSTM model and a new large dataset.

Contribution

It introduces a new benchmark dataset with over two million sentences, preserving context and document-level splits, and demonstrates the benefits of contextual embeddings for citation prediction.

Findings

01

Context improves citation worthiness prediction accuracy.

02

Hierarchical BiLSTM effectively models sentence context.

03

Contextual embeddings enhance model performance.

Abstract

In this paper, we study the importance of context in predicting the citation worthiness of sentences in scholarly articles. We formulate this problem as a sequence labeling task solved using a hierarchical BiLSTM model. We contribute a new benchmark dataset containing over two million sentences and their corresponding labels. We preserve the sentence order in this dataset and perform document-level train/test splits, which importantly allows incorporating contextual information in the modeling process. We evaluate the proposed approach on three benchmark datasets. Our results quantify the benefits of using context and contextual embeddings for citation worthiness. Lastly, through error analysis, we provide insights into cases where context plays an essential role in predicting citation worthiness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · scientometrics and bibliometrics research

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM