TL;DR
This paper presents an ensemble system combining transformer-based models and traditional features to classify scientific article abstracts into seven categories, achieving state-of-the-art performance.
Contribution
The paper introduces a novel ensemble approach integrating RoBERTa, topic models, sentence-level RoBERTa, and TF-IDF based Logistic Regression for scientific article classification.
Findings
Achieved an F1 score of 0.93 on test and validation sets.
Outperformed the existing SOTA model SciBERT.
Demonstrated effectiveness of ensemble learning with transformer models and traditional features.
Abstract
Many time reviewers fail to appreciate novel ideas of a researcher and provide generic feedback. Thus, proper assignment of reviewers based on their area of expertise is necessary. Moreover, reading each and every paper from end-to-end for assigning it to a reviewer is a tedious task. In this paper, we describe a system which our team FideLIPI submitted in the shared task of SDPRA-2021 [14]. It comprises four independent sub-systems capable of classifying abstracts of scientific literature to one of the given seven classes. The first one is a RoBERTa [10] based model built over these abstracts. Adding topic models / Latent dirichlet allocation (LDA) [2] based features to the first model results in the second sub-system. The third one is a sentence level RoBERTa [10] model. The fourth one is a Logistic Regression model built using Term Frequency Inverse Document Frequency (TF-IDF)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Linear Warmup With Linear Decay · Softmax · Adam · Multi-Head Attention · Attention Dropout · Weight Decay · Residual Connection · Dropout · WordPiece
