TL;DR
This paper introduces Citss, a self-supervised contrastive learning framework that enhances pretrained language models for citation classification, effectively addressing data scarcity and noise issues, and improving performance across multiple benchmarks.
Contribution
The paper proposes a novel contrastive learning approach, Citss, compatible with both encoder and decoder PLMs, to improve citation classification with limited labeled data.
Findings
Outperforms previous state-of-the-art methods on benchmark datasets.
Effective in reducing reliance on keyphrases and handling contextual noise.
Compatible with both encoder-based PLMs and decoder-based LLMs.
Abstract
Citation classification, which identifies the intention behind academic citations, is pivotal for scholarly analysis. Previous works suggest fine-tuning pretrained language models (PLMs) on citation classification datasets, reaping the reward of the linguistic knowledge they gained during pretraining. However, directly fine-tuning for citation classification is challenging due to labeled data scarcity, contextual noise, and spurious keyphrase correlations. In this paper, we present a novel framework, Citss, that adapts the PLMs to overcome these challenges. Citss introduces self-supervised contrastive learning to alleviate data scarcity, and is equipped with two specialized strategies to obtain the contrastive pairs: sentence-level cropping, which enhances focus on target citations within long contexts, and keyphrase perturbation, which mitigates reliance on specific keyphrases.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus · Contrastive Learning
