LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for   Identification of Informative COVID-19 English Tweets

Abhilasha Sancheti; Kushal Chawla; Gaurav Verma

arXiv:2009.03849·cs.CL·September 9, 2020

LynyrdSkynyrd at WNUT-2020 Task 2: Semi-Supervised Learning for Identification of Informative COVID-19 English Tweets

Abhilasha Sancheti, Kushal Chawla, Gaurav Verma

PDF

TL;DR

This paper presents an ensemble semi-supervised learning system combining traditional classifiers and pre-trained language models to identify informative COVID-19 tweets, achieving high F1-scores.

Contribution

It introduces a novel ensemble approach that integrates pseudo-labeling with advanced language models for tweet classification during the COVID-19 pandemic.

Findings

01

Achieved an F1-score of 0.9179 on validation set.

02

Achieved an F1-score of 0.8805 on test set.

03

Demonstrated effectiveness of semi-supervised ensemble methods.

Abstract

We describe our system for WNUT-2020 shared task on the identification of informative COVID-19 English tweets. Our system is an ensemble of various machine learning methods, leveraging both traditional feature-based classifiers as well as recent advances in pre-trained language models that help in capturing the syntactic, semantic, and contextual features from the tweets. We further employ pseudo-labelling to incorporate the unlabelled Twitter data released on the pandemic. Our best performing model achieves an F1-score of 0.9179 on the provided validation set and 0.8805 on the blind test-set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.