Leveraging NCBI Genomic Metadata for Epidemiological Insights: Example of Enterobacterales
Bryan Harris, Majid Bani-Yaghoub

TL;DR
This study demonstrates how NCBI genomic metadata can be systematically used to extract epidemiological insights for infectious diseases, exemplified by Enterobacterales, revealing diverse host sources and detailed spatial-temporal patterns beyond traditional surveillance.
Contribution
The paper introduces a novel approach and open-source tool for leveraging NCBI genomic data to enhance epidemiological analysis and disease surveillance.
Findings
NCBI data shows broader host diversity than traditional sources.
Seasonal trends in Enterobacterales are consistent across datasets.
The developed Python package enables real-time genomic metadata analysis.
Abstract
Numerous studies have utilized NCBI data for genomic analysis, gene annotation, and identifying disease-associated variants, yet NCBI's epidemiological potential remains underexplored. This study demonstrates how NCBI datasets can be systematically leveraged to extract and interpret infectious disease patterns across spatial and temporal dimensions. Using Enterobacterales as a case study, we analyzed over 477,000 genomic records and metadata, including collection date, location, host species, and isolation source. We compared trends of Escherichia coli and Salmonella in NCBI data with CDC's National Outbreak Reporting System (NORS). While both datasets showed consistent seasonal peaks and foodborne sources, NCBI data revealed broader host species (e.g., wildlife, environmental reservoirs), greater isolate diversity, and finer spatial-temporal resolution. These insights were enabled by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsZoonotic diseases and public health · Data-Driven Disease Surveillance · Salmonella and Campylobacter epidemiology
