Mining Reasons For And Against Vaccination From Unstructured Data Using Nichesourcing and AI Data Augmentation
Dami\'an Ariel Furman, Juan Junqueras, Z. Bur\c{c}e G\"um\"usl\"u,, Edgar Altszyler, Joaquin Navajas, Ophelia Deroy, Justin Sulik

TL;DR
This paper introduces RFAV, a dataset for analyzing vaccination reasons from unstructured text, utilizing nichesourcing and AI data augmentation with GPT models to improve mining accuracy.
Contribution
It provides a novel dataset and methodology for extracting vaccination reasons from unstructured data, enhanced by AI augmentation and detailed annotation guidelines.
Findings
AI-augmented data improves reasoning extraction accuracy.
Nichesourcing effectively captures subjective vaccination reasons.
Published dataset and models facilitate future research.
Abstract
We present Reasons For and Against Vaccination (RFAV), a dataset for predicting reasons for and against vaccination, and scientific authorities used to justify them, annotated through nichesourcing and augmented using GPT4 and GPT3.5-Turbo. We show how it is possible to mine these reasons in non-structured text, under different task definitions, despite the high level of subjectivity involved and explore the impact of artificially augmented data using in-context learning with GPT4 and GPT3.5-Turbo. We publish the dataset and the trained models along with the annotation manual used to train annotators and define the task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
