Classifying COVID-19 hospitalizations in epidemiology cohort studies: The C4R study

Elizabeth C. Oelsner; Akshaya Krishnaswamy; Rafail Rustamov; Pallavi P. Balte; Tauqeer Ali; Norrina B. Allen; Howard F. Andrews; Pramod Anugu; Alexander Arynchyn; Lori A. Bateman; Jianwen Cai; Harry Chang; Lucas Chen; Mitchell S. V. Elkind; James S. Floyd; Kelley Pettee Gabriel; Sina A. Gharib; Jose D. Gutierrez; Karen Hinckley Stukovsky; Virginia J. Howard; Carmen R. Isasi; Lauren Jager; Ling Jin; Suzanne E. Judd; Alka M. Kanaya; Namratha R. Kandula; Maureen R. Kelly; Sadiya S. Khan; Anna Kucharska-Newton; Joyce S. Lee; Emily B. Levitan; Cora E. Lewis; Barry J. Make; Kimberly Malloy; Jennifer J. Manly; David Mauger; Yuan-I Min; Joanne M. Murabito; Charles G. Murphy; Arnita F. Norwood; George T. O’Connor; Victor E. Ortega; Ashmi A. Patel; Amber Pirzada; Elizabeth A. Regan; Kimberly B. Ring; Wayne D. Rosamond; David A. Schwartz; James M. Shikany; Daniela Sotres-Alvarez; Cheryl Tarlton; Janis Tse; Elman M. Urbina Meneses; Maya Vankineni; Sally E. Wenzel; Prescott G. Woodruff; Vanessa Xanthakis; Ji Hyun Yang; Neil A. Zakai; Ying Zhang; Wendy S. Post; Kamal Sharma; Kamal Sharma; Kamal Sharma

PMC · DOI:10.1371/journal.pone.0316198·February 10, 2025

Classifying COVID-19 hospitalizations in epidemiology cohort studies: The C4R study

Elizabeth C. Oelsner, Akshaya Krishnaswamy, Rafail Rustamov, Pallavi P. Balte, Tauqeer Ali, Norrina B. Allen, Howard F. Andrews, Pramod Anugu, Alexander Arynchyn, Lori A. Bateman, Jianwen Cai, Harry Chang, Lucas Chen, Mitchell S. V. Elkind, James S. Floyd, Kelley Pettee Gabriel

PDF

Open Access

TL;DR

This study developed a protocol to accurately classify COVID-19 hospitalizations in a large US cohort, showing high reliability and identifying complications missed by standard diagnosis codes.

Contribution

The study introduces a standardized protocol for adjudicating hospitalized COVID-19 cases in epidemiology cohorts, improving outcome classification accuracy.

Findings

01

Adjudication confirmed 88% of potential hospitalizations were due to COVID-19, with 8% not caused by it.

02

Pneumonia and acute kidney injury were common complications, while cardiovascular and thrombotic events were rare.

03

Discharge diagnosis codes had higher sensitivity for pneumonia and pulmonary embolism than other complications.

Abstract

Robust COVID-19 outcomes classification is important for ongoing epidemiology research on acute and post-acute COVID-19 conditions. Protocolized medical record review is an established method to validate endpoints for clinical trials and cardiovascular epidemiology cohorts; however, a protocol to adjudicate hospitalizations for COVID-19 among epidemiology cohorts was lacking. We developed a protocol to ascertain and adjudicate hospitalized COVID-19 across a meta-cohort of 14 US prospective cohort studies. This report describes the first three years of protocol implementation (October 1, 2020—October 1, 2023) and evaluates its repeatability and performance compared to classification by administrative codes. The protocol was adapted from cohort approaches to clinical cardiovascular events ascertainment and adjudication. Potential COVID-19 hospitalizations and deaths were identified by…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases9

pneumonia acute kidney injury myocardial infarction pulmonary embolism critically ill cardiovascular and thrombotic complications deaths post-COVID-19 conditions COVID-19

Figures2

Click any figure to enlarge with its caption.

Fig 1 — Incidence of COVID-19-related hospitalization and/or death over C4R follow-up, United States, January 2020—June 2023.Incidence is calculated per month and based on date of event.

Fig 2 — Consort diagram of participants with COVID-19-related hospitalizations and deaths ascertained by C4R, January 2020—June 2023.

Funding1

—http://dx.doi.org/10.13039/100000050National Heart, Lung, and Blood Institute

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChronic Obstructive Pulmonary Disease (COPD) Research · Frailty in Older Adults · Machine Learning in Healthcare

Full text

Introduction

Coronavirus disease 2019 (COVID-19) has been a leading cause of hospitalizations and deaths since 2020 [1, 2]. Large population-based studies with accurate characterization of hospitalized COVID-19 and its acute complications–as well as clinical, biomarker, and lifestyle data from before and after infection–are urgently needed to understand mechanisms of acute disease and post-acute sequelae of COVID-19 (PASC), which impacts up to 57% of hospitalized COVID-19 patients [3].

Many extant US cohort studies have collected extensive data on clinical and subclinical diseases and their risk factors, including behavior, cognition, biomarkers, and social determinants of health. Since enrollment for these studies was completed prior to the COVID-19 pandemic, they offer a unique opportunity to study risk factors for incident COVID-19 while minimizing the referral, survival, and recall biases that are common to COVID-19 case series and disease-based studies. However, in the absence of a US national healthcare system, most longstanding US epidemiology studies are not able to establish comprehensive linkages to electronic health records (EHRs) and have depended on medical records ascertainment and adjudication for cardiovascular and respiratory health outcomes. A protocol to define hospitalizations for COVID-19 among epidemiology cohorts was lacking.

The Collaborative Cohort of Cohorts for COVID-19 Research (C4R) study is a unique meta-cohort of NIH-funded prospective, observational epidemiology cohort studies that was funded to perform standardized, prospective ascertainment of COVID-19, including physician adjudication of medical records for COVID-19 hospitalizations and deaths (hereafter, “events”). The primary purpose of this report is to describe the C4R events adjudication protocol and preliminary experience applying the protocol from February 2021 to June 2023. This report includes a comparison of ICD-based versus physician adjudicated protocol-based classification for COVID-related fatal and non-fatal hospitalizations.

Materials and methods

Study design

C4R enrolled adult participants from 14 long-standing cardiovascular, neurological, and respiratory cohorts [4]: Atherosclerosis Risk in Communities (ARIC) [5, 6], Coronary Artery Risk Development in Young Adults (CARDIA) [7], Genetic Epidemiology of COPD (COPDGene) [8], Framingham Heart Study (FHS) [9], Hispanic Community Health Study/Study of Latinos (HCHS/SOL) [10–13], Jackson Heart Study (JHS) [14–16], Mediators of Atherosclerosis in South Asians Living in America (MASALA) [17, 18], Multi-Ethnic Study of Atherosclerosis (MESA) [19], Northern Manhattan Study (NOMAS) [20], Prevent Pulmonary Fibrosis (PrePF) [21], REeasons for Geographic and Racial Differences in Stroke (REGARDS) [22], Severe Asthma Research Program (SARP) [23], Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS) [24], and the Strong Heart Study (SHS) [25, 26]. Details on each of the component cohorts are provided in S1 Appendix and S1 Table. Cohort participants previously consented to in-person, telephone, and/or e-mail contact and for abstraction of medical records.

C4R received funding in 2020 to perform standardized prospective data collection on COVID-19 and to harmonize pre-pandemic deep phenotyping available in the cohorts. Columbia University developed the standard questionnaires and protocols and served as the Data Coordination and Harmonization Center (DCHC) for C4R (Columbia University Institutional Review Board, AAAT3035). Of note, in some cohorts, COVID-19 questionnaires were initiated before the development of a standard C4R questionnaire; these questionnaires were later harmonized with the C4R questionnaire and classified as C4R questionnaires.

As previously reported [4], following a cohort ancillary studies model, researchers in each cohort study were directly responsible for accomplishing data collection in accordance with the standard protocols and under the supervision of their own observational studies monitoring board, steering committee, institutional review board (IRB), and any other applicable regulatory authorities. Columbia University served as the Data Coordination and Harmonization Center (DCHC) for C4R (Columbia University Institutional Review Board, AAAT3035). A full list of cohort IRBs supervising implementation of the C4R protocols is provided in S2 Table.

Following cohort-specific IRB approval and consent processes (including verbal, remote, and traditional written informed consent), adult participants in the pre-existing cohort studies were enrolled into the C4R ancillary study on a rolling basis from April 9, 2020, to February 28, 2023. COVID-19 data collection was accomplished by two waves of questionnaires (Wave 1: April 2020–May 2022; Wave 2: February 2021–February 2023), SARS-CoV-2 serosurvey (February 2021–February 2023), and ascertainment and adjudication of COVID-19 events, which are the subject of this report.

For the purposes of the work presented in this manuscript, data were accessed at Columbia University from July 1, 2021, to October 1, 2023.

Protocol development and coordination

The C4R events protocol is included in S2 Appendix. The protocol was developed by investigators and cohort personnel with experience in clinical events ascertainment [5, 7, 9, 11, 12, 17, 19, 20, 26–33] and approved by the C4R Cohort Coordinating Committee (CCC) in February 2021. Relevant administrative and review forms were coded into the C4R Events REDCap [34, 35] toolkit, which was made available to the cohorts via a central (Columbia) instance or for local adaptation by cohort data coordinating centers (DCCs). For the central instance, cohort-specific Data Access Groups (DAGs) ensured that each cohort was only able to enter or access its own participant data. Cohorts provided bimonthly tracking reports regarding cohort-specific elements of protocol completion to the DCHC, which integrated these reports and generated status reports to the Events Subcommittee, the CCC, and funding agencies. De-identified COVID-19 events data were made available for analysis on the C4R Analysis Commons and shared with cohort-specific DCCs for cohort use and transfer to other data sharing platforms.

COVID-19 events ascertainment

Potential COVID-19 hospitalizations and deaths were ascertained by cohorts via several mechanisms. C4R questionnaires, which were administered in two waves across all 14 cohorts, included questions regarding hospitalization for COVID-19 (S3 Table). If necessary (e.g., in cases of participant dementia or death), questionnaires were administered to a participant’s proxy. Additional COVID-19-related hospitalizations and deaths were identified by regular non-C4R follow-up calls conducted in 9 cohorts (ARIC, CARDIA, FHS, HCHS/SOL, JHS, MESA, NOMAS, REGARDS, SHS) to collect information on all-cause hospitalization and vital status. These data were supplemented with information collected at in-person exams, which were conducted in all cohorts during the pandemic period, except for NOMAS and REGARDS. Various non-questionnaire ascertainment methods, such as active surveillance of local EHR systems and other sources (e.g., obituaries), were performed by ARIC, FHS, JHS, SARP, SHS, and selected clinical sites in CARDIA, COPDGene, and MESA. Most cohorts supplemented vital status data with National Death Index (NDI) searches, although these are subject to reporting lags and were not available for the pandemic period at the time of this report.

After confirming participant/proxy consent, cohort staff requested copies of medical records for potential COVID-19-related hospitalizations and deaths, including physician notes (admission, consultation, discharge), radiology and laboratory reports, electrocardiogram reports, and discharge diagnoses (including discharge diagnosis ICD codes). Where applicable, death certificates were obtained. Medical records submitted for central review at the DCHC were de-identified prior to secure file transfer.

Adjudication

Eligibility for adjudication

Hospitalizations (fatal and non-fatal) and out-of-hospital deaths were eligible for protocolized medical record review if they were assigned, in any position, any of the following COVID-19 discharge diagnoses based on the International Classification of Diseases, Tenth Revision (ICD-10), Clinical Modification codes [36]: confirmed COVID-19 (U07.1), post-infectious state after COVID-19 (U09.9), Multisystem inflammatory syndrome associated with COVID-19 (M35.81), Personal History of COVID-19 (Z86.16), Pneumonia due to coronavirus disease 2019 (J12.82), Other viral pneumonia (J12.89), or, for events occurring prior to May 1 2020, other coronavirus (B97.29). In the absence of these discharge diagnoses, records were considered eligible for review if there was evidence of a positive COVID-19 test, physician suspicion of COVID-19, or next-of-kin interview indicating suspected or known COVID-19 infection. Episodes of treatment in the Emergency Department (ED) for >24 hours were classified as hospitalizations since many hospitals used the ED for inpatient care during the heights of COVID-19 surges.

Adjudication process

Eligible COVID-19-related hospitalizations (fatal and non-fatal) and out-of-hospital deaths were subjected to adjudication by physicians and/or clinical nurse practitioners with experience in evaluating hospitalized COVID-19. Reviewers were trained via webinar and individualized instruction. Three cohorts (FHS, REGARDS, SARP) elected to perform review by local adjudicators; records from the remaining cohorts were reviewed centrally at the DCHC. Adjudication entailed data abstraction, including information on oxygenation levels and medication administration, followed by classification of COVID-19 outcomes. All COVID-19 outcomes were defined as definite, probable, or absent, based on specific criteria, including symptoms and test results, and adjudicator judgment (S4 Table).

Adjudication of COVID-19 infection

Adjudication of Definite COVID-19 Infection required evidence of a positive SARS-CoV-2 test. Per protocol, an administrative criterion (i.e., assignment of ICD U07.1) was sufficient to classify Definite COVID-19 infection, since this diagnosis code specifically requires positive testing for SARS-CoV-2; of note, for all other outcomes, administrative criteria were insufficient for definite classification but could be used to justify a probable classification. Definite COVID-19 infection was necessary to define any other COVID-19 outcome as definite. Adjudication of Probable COVID-19 Infection required anticipated signs and symptoms of COVID-19 without confirmatory testing. In the absence of adjudicated Definite or Probable COVID-19 infection, no additional outcomes were adjudicated.

Adjudication of COVID-19 hospitalization

In the context of Definite or Probable SARS-CoV-2 infection, hospitalization “due to” COVID-19 was defined as being admitted to the hospital for COVID-19-related signs or symptoms; developing COVID-19-related signs or symptoms during hospitalization; or, death in the ED. Whereas, hospitalization “with” COVID-19 was defined as being diagnosed with SARS-CoV-2 infection, yet being admitted to the hospital for a reason other than COVID-19 related signs or symptoms, without developing COVID-19 signs or symptoms during hospitalization.

Adjudication of COVID-19 severity

Definitions of severe and critical disease were based on NIH COVID-19 treatment guidelines [37].

Adjudication of COVID-19 complications

While treating physician notes and administrative criteria could be used for Probable classification of COVID-19 complications, Definite classification of COVID-19 pneumonia, pulmonary embolus (PE), deep vein thrombosis (DVT), or stroke required radiologic evidence in the medical record. Definite classification of acute kidney injury (AKI) was based on reported blood creatinine levels or initiation of renal replacement therapy. Definite COVID-19 myocardial infarction (MI) was defined based on biomarker and electrocardiographic or pathologic criteria to identify cases of MI caused by acute atherothrombotic coronary artery disease, or “Type 1” MI, in the context of COVID-19 infection [38]. Probable COVID-19 MI was defined more broadly and likely captures Type 1 and Type 2 MI. Myocardial injury was defined as a maximum recorded troponin level that was greater than two times the upper limit of normal (ULN), with or without associated ischemic symptoms.

Re-adjudication

Following completion of adjudication, reviewers could request a second independent adjudication for challenging cases. A random 10% subset of records was also submitted for a second independent adjudication.

Analysis

Characteristics of C4R participants, with and without an ascertained or adjudicated COVID-19 event, were tabulated. The incidence of events ascertained and adjudicated per month (based on date of event) was plotted. Since we found that records relating to out-of-hospital deaths (e.g., death certificates) did not contain sufficient data for adjudication, the following analyses were limited to adjudicated fatal and non-fatal COVID-19 hospitalizations among participants who consented to data sharing on the C4R Analysis Commons. The incidence of COVID-19-related outcomes was tabulated, by level of certainty. Interrater agreement was assessed via positive agreement, negative agreement, and the Cohen’s κ-statistic [39]. The performance of discharge diagnosis ICD code-based classification was compared against definite C4R classifications, which were treated as the reference standards; probable classifications were excluded from these comparisons because they included discharge diagnosis ICD codes in the diagnostic criteria. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated. Also, myocardial injury was compared to adjudicated classifications of definite COVID-19 MI. Analyses were performed in SAS Studio software (SAS Institute, Cary, NC) on the C4R Analysis Commons.

Results

COVID-19 events ascertainment

Among 49,790 C4R participants, 1,974 potential COVID-19-related hospitalization and/or deaths among 1,772 (3.6%) participants were ascertained between January 2020 and June 2023 (Fig 1), for an estimated incidence density rate of 11.9 per 1,000 person-years of follow-up. Overall, the cohorts reported that half (53%) of these events were identified by a C4R questionnaire. As of June 2023, 1,768 (90%) medical records were requested, of which 1,523 (86%) were obtained. The most common reasons for failure to obtain medical records were inability to obtain participant consent and lack of response from the hospital.

Incidence of COVID-19-related hospitalization and/or death over C4R follow-up, United States, January 2020—June 2023.Incidence is calculated per month and based on date of event.

Following 3 years of protocol funding (October 1, 2020—October 1, 2023), 1,237 of 1,974 (63%) events were adjudicated over 28 months. After exclusion of 57 out-of-hospital deaths due to incomplete information and 13 non-fatal hospitalizations with consent restrictions, there were 1,167 (59% of the 1,974) events available for analysis, of which 1,030 had evidence of COVID-19 (88% of 1,167 events available for analysis) (Fig 2).

Consort diagram of participants with COVID-19-related hospitalizations and deaths ascertained by C4R, January 2020—June 2023.

Table 1 compares socio-demographic and clinical characteristics of 1,772 participants among whom a potential COVID-19 hospitalization was ascertained, including those for whom events were (N = 1098) and were not yet (N = 674) adjudicated as of October 2023, versus those with no COVID-19 hospitalization or death ascertained through June 2023 (N = 48,018). Compared to participants without a potential COVID-19 hospitalization or death, those with an ascertained event were more likely to be male, to report American Indian ancestry, and to have a history of pre-pandemic smoking, diabetes, or hypertension; they were less likely to have education beyond college and less likely to be vaccinated compared to those without ascertained events. Notably, among participants with ascertained potential COVID-19 events, characteristics of those with adjudicated records were generally similar to those pending adjudication. More than one COVID-19 hospitalization was ascertained in 50 (3%) participants.

Table 1: Baseline characteristics of C4R participants according to ascertainment and adjudication of COVID-19 hospitalization or death.

Adjudication

Adjudication process

There were 7 reviewers at the C4R DCHC, 5 at FHS, 3 at REGARDS, and 2 at SARP. The median length of medical records was 59 pages (IQR: 37, 105). The average time to review was approximately 30 minutes per record.

Adjudicated outcomes

Definite SARS-CoV-2 infection was adjudicated in 87% of 1,167 events, whereas probable infection was adjudicated in only 1%. Among 1,030 adjudicated hospitalized events in which SARS-CoV-2 infection was definite or probable, COVID-19 illness was diagnosed as the cause of 952 (93%) of the hospitalizations. Table 2 describes the adjudicated probable and definite diagnoses of SARS-CoV-2 infection, acute COVID-19 illness severity, and acute COVID-19 complications following the criteria described in S3 Table: 77% were adjudicated as severe COVID-19, 31% as critical COVID-19, 80% had adjudicated COVID-19-associated pneumonia, and 34% had adjudicated COVID-19-associated AKI. Adjudicated fatal hospitalization due to COVID-19 and other cardiopulmonary complications such as PE, DVT, and MI were less common.

Table 2: Adjudicated COVID-19 outcomes for COVID-19-related hospitalizations ascertained by C4R.

Adjudicator agreement

Of 139/1237 (11%) hospitalizations that underwent a second adjudication, there was almost perfect agreement for adjudication of COVID-19 infection, critical illness, stroke, PE, DVT, and fatal hospitalization (Table 3). There was also strong agreement for hospitalization, severe illness, pneumonia, and renal failure, and moderate agreement for MI.

Table 3: Interrater agreement for adjudication of COVID-19 outcomes in C4R.

Comparison of adjudication vs. ICD codes

Compared to adjudicated diagnoses with a definite certainty level, discharge ICD code-based classification was sensitive and specific for pneumonia (sensitivity = 84%, specificity = 90%), but less sensitive (57–81%) for cardiovascular and renal complications (Table 4). The PPV of ICD-based classification for COVID-19-related pneumonia (J12.82, pneumonia due to coronavirus disease 2019, J12.89, other viral pneumonia), AKI (N17, Acute kidney injury), PE (ICD Code I26, PE), DVT (I82, DVT) and stroke (I63, Cerebral infarction) was excellent, ranging from 81–97%; however, the PPV for COVID-19-related MI (I21, Acute myocardial infarction) was 33%.

Table 4: Sensitivity, specificity, and predictive values for discharge diagnosis codes versus adjudicated definite COVID-19 outcomes.

Comparison of adjudicated MI vs. myocardial injury

A substantial number of participants with adjudicated events had myocardial injury that was not adjudicated as COVID-19-related MI. Among the 1,030 hospitalized events in which SARS-CoV-2 infection was adjudicated as definite or probable, 170 (25%) demonstrated myocardial injury. Of these 170 events with myocardial injury, 25 (15%) were assigned an ICD code for MI, 5 (3%) were adjudicated as definite COVID-19-related MI, and 38 (22%) were adjudicated as probable COVID-19-related MI; the majority (75%) were not adjudicated as definite or probable COVID-19-related MI. Of note, these groups are not directly comparable with the groups described in Table 4, which only included events with ICD codes available and excluded events with probable MI diagnoses.

Discussion

Protocolized adjudication confirmed four out of five hospitalizations for COVID-19 in a US meta-cohort of prospective epidemiology cohorts and adjudicated cases of pneumonia, PE, and other conditions that were not indicated by discharge diagnosis codes. These results illustrate the importance of systematic medical record review for robust classification of COVID-19 outcomes in epidemiology cohort studies, which provide unique opportunities to study antecedent risk factors for acute COVID-19 and post-COVID conditions.

Over three years, C4R ascertained at least one potential COVID-19-related hospitalization or out-of-hospital death in 3.6% of participants, equivalent to an incidence per 1,000 person-years of follow-up of 11.9. For comparison, in 8 of the C4R cohorts, the incidence per 1,000 person-years for atherosclerotic cardiovascular disease events has been estimated at 9.7 [40]. Hence, the results of C4R events ascertainment to date highlight the major impact of COVID-19 on cohort participants and provide a substantial, longitudinal dataset to support well-powered analyses of risk factors and sequelae.

Nonetheless, our experience highlights certain limitations of standard cohort events surveillance for ascertainment of COVID-19-related events. Of hospitalizations that were ascertained as potentially COVID-19-related, SARS-CoV-2 infection could not be confirmed in one in eight events. These cases could have been due to incorrect self-reporting of a COVID-19 hospitalization on a C4R or cohort questionnaire, or active surveillance methods that were more sensitive than specific. Of note, some cases where infection could not be confirmed may have been true infections, but the medical records lacked sufficient detail for adjudication. Hence, rather than censoring all non-confirmed events, C4R is assigning a certainty level for all its COVID-19 outcomes, so that investigators can select the outcome most suitable for their specific research needs.

Our findings also demonstrate the limitations of discharge diagnosis codes to define hospitalization “for” versus “with” SARS-CoV-2 infection. Of hospitalizations with confirmed SARS-CoV-2 infection, symptoms of COVID-19 illness did not contribute to the hospitalization in 7% of cases. These findings are similar to prior reports examining the role of SARS-CoV-2 in hospitalizations using EHR systems. Adjudicated C4R outcomes will allow investigators to exclude these cases of hospitalization with incidental COVID-19 from studies designed to examine risks and sequelae of severe COVID-19 illness.

Furthermore, we found that discharge diagnosis codes were insensitive for cardiopulmonary and renal complications. We adjudicated a substantial number of cases of COVID-19 pneumonia, AKI, and MI among records without a corresponding discharge diagnosis code. This may be related, in part, to the major challenges of charting during COVID-19 surges, when documentation standards were modified to prioritize direct patient care activities. Nonetheless, before the pandemic period, previous investigations on the use of administrative data to identify cases of pneumonia have found that ICD codes are imprecise and can result in a substantial number of pneumonia cases going undetected [41, 42]. These results are supported by earlier findings that ICD-10 code N17 often misses AKI during hospitalizations for kidney transplant patients [43]. Potential misestimation of the incidence of MI using claims or administrative data has also been well-described [44–47]. This has been one justification for longstanding—albeit labor- and time-intensive—ASCVD events adjudication programs in many of the C4R cohorts.

As expected, we found that a substantial proportion of hospitalized events included evidence of myocardial injury, defined by troponin values two times greater than the upper limit of normal; however, only a small subset of these cases was adjudicated as COVID-19 MI. We elected to restrict definite MI to confirmed cases of type 1 (ST-elevation MI) or type 3 MI (MI resulting in death with pathological evidence of MI). It may be difficult to differentiate type 2 MI/supply-demand mismatch, which requires documentation of a rise and fall in troponin levels and symptoms of ischemia, from myocardial injury (at least one elevated troponin level), based solely on review of obtained medical records. Prior studies demonstrated that elevated troponin values are common in patients hospitalized with COVID-19 [48–50], and may occur with conditions other than MI, such as cardiomyopathy, acute cor pulmonale, arrhythmias, or cardiogenic shock [51]. Our results suggest that the true incidence of type 1 MI, due to plaque disruption and thrombosis leading to coronary occlusion, was rare in the context of acute COVID-19, supporting the utility of protocolized adjudication in validating these outcomes for COVID-19 cardiovascular research.

Altogether, our findings have several implications for EHR-based studies that do not adjudicate endpoints. Our results suggest that ICD-based events definitions could overestimate the number of cases of hospitalization due to COVID-19 illness and underestimate COVID-related complications. In addition to incorrectly estimating the prevalence and incidence of COVID-related outcomes, the observed measurement errors could reduce the robustness and reproducibility of epidemiologic analyses. If ICD misclassification was nondifferential according to risk factors of interest, it would reduce precision and increase the risk of type 2 error; if misclassification was associated with a risk factor of interest, this could potentially bias results away from the null hypothesis. Of note, EHR-based studies often use complex algorithms to define clinical outcomes and do not rely on ICD codes alone. Nonetheless, our findings underscore the importance of validating outcome definitions via comparison with robust approaches such as protocolized events adjudication.

Strengths and limitations

Strengths of this study include prospective ascertainment of potential COVID-19-related hospitalizations and deaths in a well-characterized, multi-ethnic, US community-based sample of adults that is relatively free of referral, survival, and recall biases compared with studies that only include hospitalized patients. The adjudication protocol, which was modeled on gold-standard epidemiology cohort events adjudication in many of the C4R cohorts, was fully standardized across the cohorts and implemented at scale to generate robust events data for analysis within three years of program initiation.

Nonetheless, certain limitations must be considered, in addition to those noted above. Medical records have not yet been obtained for 37% of cases due to lack of participant consent, lack of response from hospital systems, or other operational delays that are common to cohort events ascertainment operations. This highlights the need for novel approaches for cohorts to access medical records, such as the consenting of participants to share their own electronic medical records—now available to patients via the 21^st^ Century Cures Act [52]—with cohorts [52]. The characteristics of participants with missing versus available medical records in C4R were comparable, so there was no clear evidence of selection bias. Some medical records had incomplete data available for adjudication, which could be related to relaxation of documentation requirements during the pandemic. Few out-of-hospital deaths were ascertained, and medical records for out-of-hospital deaths were often incomplete, due to various reasons, including lack of court documents from the family of the deceased to obtain records, or hospital decision to not grant access to the decedent’s medical records for research purposes. To address this limitation, additional information on out-of-hospital deaths associated with COVID-19 will be obtained from the NDI, which provides complete ascertainment of deaths in the US, including ICD-codes; unfortunately, there are substantial time lags inherent in NDI reporting.

Conclusions

Ascertainment and adjudication of COVID-19 hospitalizations and deaths in longstanding NIH-funded cohort studies were feasible, albeit time- and resource-intensive, and our results illustrate the importance of systematic medical records adjudication for robust classification of COVID-19 events. Adjudication confirmed SARS-CoV-2 infection in 88% of ascertained events and found that infection may have been incidental to the hospitalization in 7% of the cases. Compared to adjudication, discharge diagnosis codes were insensitive for acute cardiovascular and renal complications of COVID-19. Novel approaches to expedite medical records access and linkage would augment unique opportunities for COVID-19 and other health outcomes research, particularly for emerging diseases, in NIH-funded epidemiology cohorts.

Supporting information

S1 AppendixCohort descriptions.(DOCX)

S2 AppendixC4R events protocol.(PDF)

S1 TableCharacteristics of participants in C4R cohorts, United States, March 1, 2020.(DOCX)

S2 TableCohort Institutional Review Boards (IRBs) supervising implementation of the C4R protocols.(DOCX)

S3 TableSelected COVID-19 questionnaire elements and their inclusion by cohorts in questionnaires for C4R, United States, April 2020–February 2023.(DOCX)

S4 TableCriteria for classification of COVID-19 diagnoses as definite or probable based on medical record review.(DOCX)

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Shiels MS, Haque AT, Berrington de González A, Freedman ND. Leading Causes of Death in the US During the COVID-19 Pandemic, March 2020 to October 2021. JAMA Internal Medicine. 2022;182(8):883–6. doi: 10.1001/jamainternmed.2022.2476 35788262 PMC 9257676 · doi ↗ · pubmed ↗
2Chen C, Haupert SR, Zimmermann L, Shi X, Fritsche LG, Mukherjee B. Global Prevalence of Post-Coronavirus Disease 2019 (COVID-19) Condition or Long COVID: A Meta-Analysis and Systematic Review The Journal of Infectious Diseases. 2022;226(9):1593–607. doi: 10.1093/infdis/jiac 136 35429399 PMC 9047189 · doi ↗ · pubmed ↗
3Oelsner EC, Krishnaswamy A, Balte PP, Allen NB, Ali T, Anugu P, et al. Collaborative Cohort of Cohorts for COVID-19 Research (C 4R) Study: Study Design. American Journal of Epidemiology. 2022;191(7):1153–73. doi: 10.1093/aje/kwac 032 35279711 PMC 8992336 · doi ↗ · pubmed ↗
4The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol. 1989;129(4):687–702. Epub 1989/04/01. .2646917 · pubmed ↗
5Wright JD, Folsom AR, Coresh J, Sharrett AR, Couper D, Wagenknecht LE, et al. The ARIC (Atherosclerosis Risk In Communities) Study: JACC Focus Seminar 3/8. J Am Coll Cardiol. 2021;77(23):2939–59. doi: 10.1016/j.jacc.2021.04.035 .34112321 PMC 8667593 · doi ↗ · pubmed ↗
6Friedman GD, Cutter GR, Donahue RP, Hughes GH, Hulley SB, Jacobs DR Jr., et al. CARDIA: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol. 1988;41(11):1105–16. Epub 1988/01/01. doi: 10.1016/0895-4356(88)90080-7 .3204420 · doi ↗ · pubmed ↗
7Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, et al. Genetic epidemiology of COPD (COPD Gene) study design. COPD. 2010;7(1):32–43. Epub 2010/03/11. doi: 10.3109/15412550903499522 .20214461 PMC 2924193 · doi ↗ · pubmed ↗
8Tsao CW, Vasan RS. Cohort Profile: The Framingham Heart Study (FHS): overview of milestones in cardiovascular epidemiology. Int J Epidemiol. 2015;44(6):1800–13. Epub 2015/12/26. doi: 10.1093/ije/dyv 337 .26705418 PMC 5156338 · doi ↗ · pubmed ↗