Large Scale Subject Category Classification of Scholarly Papers with Deep Attentive Neural Networks
Bharath Kandimalla, Shaurya Rohatgi, Jian Wu, C Lee Giles

TL;DR
This paper introduces a deep attentive neural network trained on 9 million abstracts to classify scholarly papers into 104 categories, improving accuracy especially for new papers without citation data.
Contribution
The study presents a novel deep attentive neural network model that classifies papers using only abstracts, outperforming baseline models and addressing citation data limitations.
Findings
Achieved micro-F1 of 0.76 on classification task.
Word vectors combined with TFIDF outperform other text representations.
Attention mechanism enhances classification accuracy.
Abstract
Subject categories of scholarly papers generally refer to the knowledge domain(s) to which the papers belong, examples being computer science or physics. Subject category information can be used for building faceted search for digital library search engines. This can significantly assist users in narrowing down their search space of relevant documents. Unfortunately, many academic papers do not have such information as part of their metadata. Existing methods for solving this task usually focus on unsupervised learning that often relies on citation networks. However, a complete list of papers citing the current paper may not be readily available. In particular, new papers that have few or no citations cannot be classified using such methods. Here, we propose a deep attentive neural network (DANN) that classifies scholarly papers using only their abstracts. The network is trained using 9…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
