Auditing and Robustifying COVID-19 Misinformation Datasets via   Anticontent Sampling

Clay H. Yoo; Ashiqur R. KhudaBukhsh

arXiv:2310.07078·cs.LG·October 12, 2023

Auditing and Robustifying COVID-19 Misinformation Datasets via Anticontent Sampling

Clay H. Yoo, Ashiqur R. KhudaBukhsh

PDF

Open Access

TL;DR

This paper evaluates COVID-19 misinformation datasets for real-world robustness and introduces an active learning method that enhances classifier resilience against diverse anticontent without manual labeling.

Contribution

It highlights the limited diversity in existing datasets and proposes a novel anticontent sampling pipeline to improve classifier robustness.

Findings

01

Models trained on existing datasets are vulnerable to anticontent in real-world scenarios.

02

The proposed active learning pipeline effectively augments training data with challenging anticontent.

03

Classifiers become more robust after applying the anticontent sampling method.

Abstract

This paper makes two key contributions. First, it argues that highly specialized rare content classifiers trained on small data typically have limited exposure to the richness and topical diversity of the negative class (dubbed anticontent) as observed in the wild. As a result, these classifiers' strong performance observed on the test set may not translate into real-world settings. In the context of COVID-19 misinformation detection, we conduct an in-the-wild audit of multiple datasets and demonstrate that models trained with several prominently cited recent datasets are vulnerable to anticontent when evaluated in the wild. Second, we present a novel active learning pipeline that requires zero manual annotation and iteratively augments the training data with challenging anticontent, robustifying these classifiers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · COVID-19 diagnosis using AI · SARS-CoV-2 detection and testing