COVIDRead: A Large-scale Question Answering Dataset on COVID-19

Tanik Saikh; Sovan Kumar Sahoo; Asif Ekbal; Pushpak Bhattacharyya

arXiv:2110.09321·cs.CL·October 19, 2021

COVIDRead: A Large-scale Question Answering Dataset on COVID-19

Tanik Saikh, Sovan Kumar Sahoo, Asif Ekbal, Pushpak Bhattacharyya

PDF

Open Access

TL;DR

COVIDRead is a large-scale, manually verified question answering dataset on COVID-19, enabling research and development of AI models to extract relevant information during the pandemic.

Contribution

This paper introduces COVIDRead, the first large-scale COVID-19 QA dataset with over 100k questions, and provides baseline neural network models for the task.

Findings

01

Baseline models achieved F1 scores between 32.03% and 37.19%.

02

The dataset facilitates research on COVID-19 information extraction.

03

First large-volume COVID-19 QA dataset.

Abstract

During this pandemic situation, extracting any relevant information related to COVID-19 will be immensely beneficial to the community at large. In this paper, we present a very important resource, COVIDRead, a Stanford Question Answering Dataset (SQuAD) like dataset over more than 100k question-answer pairs. The dataset consists of Context-Answer-Question triples. Primarily the questions from the context are constructed in an automated way. After that, the system-generated questions are manually checked by hu-mans annotators. This is a precious resource that could serve many purposes, ranging from common people queries regarding this very uncommon disease to managing articles by editors/associate editors of a journal. We establish several end-to-end neural network based baseline models that attain the lowest F1 of 32.03% and the highest F1 of 37.19%. To the best of our knowledge, we are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques