SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization   of Scientific Topics

Zhiwen You; Kanyao Han; Haotian Zhu; Bertram Lud\"ascher; Jana Diesner

arXiv:2410.01946·cs.CL·October 4, 2024

SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics

Zhiwen You, Kanyao Han, Haotian Zhu, Bertram Lud\"ascher, Jana Diesner

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

SciPrompt is a framework that automatically enriches verbalizers with scientific terms to improve prompt-based classification of scientific topics, especially in low-resource scenarios.

Contribution

It introduces an automatic retrieval method for domain-specific terms and a weighted verbalization strategy, enhancing prompt-based fine-tuning for scientific text classification.

Findings

01

Outperforms state-of-the-art prompt-based methods in scientific classification

02

Effective in few-shot and zero-shot settings

03

Excels in classifying fine-grained and emerging topics

Abstract

Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models for a variety of tasks, including text classification. For multi-class classification tasks, prompt-based fine-tuning under low-resource scenarios has resulted in performance levels comparable to those of fully fine-tuning methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space to the class space, to solve the classification problem as a masked language modeling task. However, cross-domain and fine-grained prompt-based fine-tuning with an automatically enriched verbalizer remains unexplored, mainly due to the difficulty and costs of manually selecting domain label terms for the verbalizer, which requires humans with domain expertise. To address this challenge, we introduce SciPrompt, a framework designed to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiwenyou103/SciPrompt
pytorchOfficial

Datasets

uzw/Emerging_NLP
dataset· 52 dl
52 dl

Videos

SciPrompt: Knowledge-Augmented Prompting for Fine-Grained Categorization of Scientific Topics· underline

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies