Prospective modeling and estimating the epidemiologically informative match rate within large foodborne pathogen genomic databases
Lanlan Yin, James B. Pettengill

TL;DR
This paper studies how often genetic matches occur between patient and non-patient isolates in foodborne pathogen databases to improve public health surveillance.
Contribution
The study introduces a model to estimate and predict match rates in genomic databases, emphasizing the need for non-clinical isolates.
Findings
Match rates vary significantly across pathogens, with Salmonella having the highest at 46%.
Logistic regression modeling shows good performance in predicting match rates based on database features.
The study highlights the importance of including non-clinical isolates to improve match identification.
Abstract
Much has been written about the utility of genomic databases to public health. Within food safety these databases contain data from two types of isolates—those from patients (i.e., clinical) and those from non-clinical sources (e.g., a food manufacturing environment). A genetic match between isolates from these sources represents a signal of interest. We investigate the match rate within three large genomic databases (Listeria monocytogenes, Escherichia coli, and Salmonella) and the smaller Cronobacter database; the databases are part of the Pathogen Detection project at NCBI (National Center for Biotechnology Information). Currently, the match rate of clinical isolates to non-clinical isolates is 33% for L. monocytogenes, 46% for Salmonella, and 7% for E. coli. These match rates are associated with several database features including the diversity of the organism, the database size,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnterobacteriaceae and Cronobacter Research · Salmonella and Campylobacter epidemiology · Listeria monocytogenes in Food Safety
