Ribonucleic acid (RNA) virus and coronavirus in Google Dataset Search: their scope and epidemiological correlation
Manuel Bl\'azquez-Ochando, Juan-Jos\'e Prieto-Guti\'errez

TL;DR
This study analyzes RNA virus datasets in Google Dataset Search, revealing limited reusability, increasing publication trends related to epidemics, and highlighting challenges in dataset filtering and monitoring for open science advancement.
Contribution
It provides a comprehensive evaluation of RNA virus datasets in Google Dataset Search, focusing on scope, reuse capacity, and correlation with pandemics, which is a novel analysis in this context.
Findings
Only 52% of datasets are related to scientific research.
Just 15% of datasets are reusable.
Publication of datasets has increased, especially during major epidemics.
Abstract
This paper presents an analysis of the publication of datasets collected via Google Dataset Search, specialized in families of RNA viruses, whose terminology was obtained from the National Cancer Institute (NCI) thesaurus developed by the US Department of Health and Human Services. The objective is to determine the scope and reuse capacity of the available data, determine the number of datasets and their free access, the proportion in reusable download formats, the main providers, their publication chronology, and to verify their scientific provenance. On the other hand, we also define possible relationships between the publication of datasets and the main pandemics that have occurred during the last 10 years. The results obtained highlight that only 52% of the datasets are related to scientific research, while an even smaller fraction (15%) are reusable. There is also an upward trend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance
