Concordance Between Survey and Electronic Health Record Data in the COVID-19 Citizen Science Study: Retrospective Cohort Analysis
Elizabeth Crull, Emily C O'Brien, Pavel Antiperovitch, Kirubel Asfaw, Alexis L Beatty, Djeneba Audrey Djibo, Alan F Kaul, John Kornak, Gregory M Marcus, Madelaine Faulkner Modrow, Jeffrey E Olgin, Jaime Orozco, Soo Park, Noah Peyser, Mark J Pletcher, Thomas W Carton

TL;DR
This study compares patient-reported data from a citizen science study with electronic health records to assess agreement on demographics, chronic conditions, and COVID-19 traits.
Contribution
The study provides new insights into the concordance between self-reported and EHR data in a large cohort, highlighting discrepancies in key variables like vaccination status.
Findings
High overall agreement for demographics, but significant discordance for race, ethnicity, and smoking status.
Self-reported vaccination rates (97.4%) were much higher than EHR records (48.4%), indicating potential underreporting in EHRs.
Sleep apnea had the highest sensitivity (83.5%) among medical conditions, while anemia had the lowest (32.8%).
Abstract
Real-world data reported by patients and extracted from electronic health records (EHRs) are increasingly leveraged for research, policy, and clinical decision-making. However, it is not always obvious the extent to which these 2 data sources agree with each other. This study aimed to evaluate the concordance of variables reported by participants enrolled in an electronic cohort study and data available in their EHRs. Survey data from COVID-19 Citizen Science, an electronic cohort study, were linked to EHR data from 7 health systems, comprising 34,908 participants. Concordance was evaluated for demographics, chronic conditions, and COVID-19 characteristics. Overall agreement, sensitivity, specificity, positive predictive value, negative predictive value, and κ statistics with 95% CIs were calculated. Of 34,017 participants with complete information, 62.3% (21,176/34,017) reported…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Mobile Health and mHealth Applications · Electronic Health Records Systems
