Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers
Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng,, Jiawei Han

TL;DR
FUTEX is a novel framework for weakly supervised multi-label classification of full-text scientific papers, leveraging structural information like citation networks and hierarchical organization to improve classification accuracy without extensive labeled data.
Contribution
The paper introduces FUTEX, which effectively utilizes structural signals such as citation links and section hierarchy for weakly supervised classification of full-text papers, addressing fine-grained and multi-label challenges.
Findings
FUTEX outperforms existing weakly supervised baselines.
FUTEX matches fully supervised classifiers with minimal labeled data.
Structural information significantly enhances classification performance.
Abstract
Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords). Existing studies on weakly supervised paper classification are less concerned with two challenges: (1) Papers should be classified into not only coarse-grained research topics but also fine-grained themes, and potentially into multiple themes, given a large and fine-grained label space; and (2) full text should be utilized to complement the paper title and abstract for classification. Moreover, instead of viewing the entire paper as a long linear sequence, one should exploit the structural information such as citation links across papers and the hierarchy of sections and paragraphs in each paper. To tackle these challenges, in this study, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Topic Modeling
