Co-training for Low Resource Scientific Natural Language Inference
Mobashir Sadat, Cornelia Caragea

TL;DR
This paper introduces a co-training approach that leverages training dynamics to weight automatically labeled data, improving scientific natural language inference performance in low-resource settings.
Contribution
It proposes a novel co-training method that uses classifier behavior over epochs to assign importance weights to noisy labels, enhancing semi-supervised learning for scientific NLI.
Findings
Achieved 1.5% macro F1 improvement over baseline
Outperformed several strong SSL baselines
Effectively utilizes noisy automatically labeled data
Abstract
Scientific Natural Language Inference (NLI) is the task of predicting the semantic relation between a pair of sentences extracted from research articles. The automatic annotation method based on distant supervision for the training set of SciNLI (Sadat and Caragea, 2022b), the first and most popular dataset for this task, results in label noise which inevitably degenerates the performance of classifiers. In this paper, we propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels, reflective of the manner they are used in the subsequent training epochs. That is, unlike the existing semi-supervised learning (SSL) approaches, we consider the historical behavior of the classifiers to evaluate the quality of the automatically annotated labels. Furthermore, by assigning importance weights instead of filtering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training
