Hierarchical Multi-Label Classification of Scientific Documents
Mobashir Sadat, Cornelia Caragea

TL;DR
This paper introduces SciHTC, a large hierarchical multi-label dataset for scientific paper classification, and proposes a multi-task learning model that improves classification performance, opening new research avenues.
Contribution
The paper presents a new large-scale dataset for hierarchical multi-label classification of scientific papers and a multi-task learning approach that enhances classification accuracy.
Findings
Achieved a Macro-F1 score of 34.57% with the proposed model.
Provided a new dataset with 186,160 papers and 1,233 categories.
Established strong baselines for hierarchical multi-label classification.
Abstract
Automatic topic classification has been studied extensively to assist managing and indexing scientific documents in a digital collection. With the large number of topics being available in recent years, it has become necessary to arrange them in a hierarchy. Therefore, the automatic classification systems need to be able to classify the documents hierarchically. In addition, each paper is often assigned to more than one relevant topic. For example, a paper can be assigned to several topics in a hierarchy tree. In this paper, we introduce a new dataset for hierarchical multi-label text classification (HMLTC) of scientific papers called SciHTC, which contains 186,160 papers and 1,233 categories from the ACM CCS tree. We establish strong baselines for HMLTC and propose a multi-task learning approach for topic classification with keyword labeling as an auxiliary task. Our best model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Web Data Mining and Analysis
