Co-training for Low Resource Scientific Natural Language Inference

Mobashir Sadat; Cornelia Caragea

arXiv:2406.14666·cs.CL·June 24, 2024

Co-training for Low Resource Scientific Natural Language Inference

Mobashir Sadat, Cornelia Caragea

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a co-training approach that leverages training dynamics to weight automatically labeled data, improving scientific natural language inference performance in low-resource settings.

Contribution

It proposes a novel co-training method that uses classifier behavior over epochs to assign importance weights to noisy labels, enhancing semi-supervised learning for scientific NLI.

Findings

01

Achieved 1.5% macro F1 improvement over baseline

02

Outperformed several strong SSL baselines

03

Effectively utilizes noisy automatically labeled data

Abstract

Scientific Natural Language Inference (NLI) is the task of predicting the semantic relation between a pair of sentences extracted from research articles. The automatic annotation method based on distant supervision for the training set of SciNLI (Sadat and Caragea, 2022b), the first and most popular dataset for this task, results in label noise which inevitably degenerates the performance of classifiers. In this paper, we propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels, reflective of the manner they are used in the subsequent training epochs. That is, unlike the existing semi-supervised learning (SSL) approaches, we consider the historical behavior of the classifiers to evaluate the quality of the automatically annotated labels. Furthermore, by assigning importance weights instead of filtering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msadat3/weighted_cotraining
noneOfficial

Videos

Co-training for Low Resource Scientific Natural Language Inference· underline

Taxonomy

TopicsTopic Modeling

MethodsSparse Evolutionary Training