Enriched BERT Embeddings for Scholarly Publication Classification
Benjamin Wolff, Eva Seidlmayer, Konrad U. F\"orstner

TL;DR
This paper explores enriching datasets and fine-tuning pre-trained language models, especially SPECTER2, to improve automatic classification of scholarly articles into research fields, achieving a weighted F1-score of 0.7415.
Contribution
It demonstrates that dataset enrichment and fine-tuning pre-trained models significantly improve scholarly publication classification accuracy.
Findings
Fine-tuning models like SPECTER2 yields the highest accuracy.
Enriching datasets with metadata from bibliographic sources improves classification.
Best model achieves a weighted F1-score of 0.7415.
Abstract
With the rapid expansion of academic literature and the proliferation of preprints, researchers face growing challenges in manually organizing and labeling large volumes of articles. The NSLP 2024 FoRC Shared Task I addresses this challenge organized as a competition. The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.This paper presents our results. Initially, we enrich the dataset (containing English scholarly articles sourced from ORKG and arXiv), then leverage different pre-trained language Models (PLMs), specifically BERT, and explore their efficacy in transfer learning for this downstream task. Our experiments encompass feature-based and fine-tuned transfer learning approaches using diverse PLMs, optimized for scientific tasks, including SciBERT,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Attention Dropout · Dropout · Residual Connection · Softmax · WordPiece · Linear Layer · Adam
