Assessing Risk Factors for Cognitive Decline Using Electronic Health Record Data: A Scoping Review
Liqin Wang, Richard Yang, Ziqin Sha, Anna Maria Kuraszkiewicz, Conrad Leonik, Li Zhou, Gad A. Marshall

TL;DR
This scoping review explores how electronic health records help identify risk factors for cognitive decline, focusing on medical conditions, interventions, and lifestyle factors.
Contribution
The study systematically maps EHR-based research on cognitive decline risk factors and identifies key research gaps.
Findings
Most studies focused on medical conditions linked to increased cognitive decline risk.
Medical interventions were found to often reduce the risk of cognitive decline.
Lifestyle, socioeconomic, and environmental factors were less studied compared to medical conditions.
Abstract
The data and information contained within electronic health records (EHR) provide a rich, diverse, longitudinal view of real-world patient histories, offering valuable opportunities to study antecedent risk factors for cognitive decline. However, the extent to which such records’ data have been utilized to elucidate the risk factors of cognitive decline remains unclear. A scoping review was conducted following the PRISMA guideline, examining articles published between January 2010 and April 2023, from PubMed, Web of Science, and CINAHL. Inclusion criteria focused on studies using EHR to investigate risk factors for cognitive decline. Each article was screened by at least two reviewers. Data elements were manually extracted based on a predefined schema. The studied risk factors were classified into categories, and a research gap was identified. From 1,593 articles identified, 80 were…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDementia and Cognitive Impairment Research · Machine Learning in Healthcare · Health, Environment, Cognitive Aging
BACKGROUND
Alzheimer’s disease (AD) presents a substantial global public health challenge, given its hallmark features of chronic cognitive and functional decline in older adults. The condition is commonly categorized into three stages based on cognitive impairment severity: preclinical, where individuals exhibit normal cognitive function with or without subtle concerns but have biological evidence of underlying AD; prodromal, marked by mild cognitive impairment (MCI); and the dementia stage, characterized by significant functional impairment affecting daily life.^1, 2^ As of 2023, a staggering 6.7 million Americans are living with AD in its dementia stage, with projections estimating this number to soar to 88 million by 2050.^3^ This not only poses a substantial financial burden but also profoundly impacts affected individuals, their families, and the healthcare system. Consequently, there is an urgent need to comprehensively grasp the risk factors associated with dementia and identify potential prevention and treatment strategies to mitigate this growing concern.
Existing studies have frequently relied on prospective datasets, which tend to suffer from limitations such as small sample sizes and underrepresentation of understudied populations, resulting in notable gaps in ADRD research.^4, 5^ There is a growing consensus in the scientific community on the necessity of exploring more extensive and diverse populations.^1^
Electronic Health Record (EHR) data have proven pivotal in understanding the progression and outcomes of neurodegenerative diseases, particularly due to their chronic and gradually advancing nature. The widespread adoption of EHRs over recent decades has yielded a vast amount of longitudinal patient data. By sifting through these real-world datasets, we can gain deeper insights into the onset and evolution of AD and related dementias (ADRD), especially among populations that have been consistently engaged with the healthcare system. EHRs can be valuable in identifying potential risk factors for ADRD that might be missed in smaller convenience sample datasets. Moreover, they can highlight interventions that target certain medical problems that potentially affect the risk of dementia, particularly during early stages such as preclinical AD and MCI.
However, the extent to which EHR data have been utilized for such research remains unclear. While prior literature reviews have primarily focused on specific ADRD risk areas,^6–8^ none, to our knowledge, has specifically addressed the utilization of EHR data for analyzing ADRD risks. Our study aims to fill this gap by concentrating on the identification of risk factors for cognitive decline, with a specific emphasis on MCI and dementia. We have intentionally excluded the preclinical stage of cognitive decline from our analysis due to diagnostic challenges in the clinical setting where biomarkers of AD are not commonly obtained prior to the stage of MCI. Through this scoping review, we aim to thoroughly aggregate existing literature on EHR data usage for studying these stages of cognitive decline and highlight potential areas for future research.
METHODS
Search Strategy
This scoping review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.^9^ We conducted a Boolean search in PubMed, Web of Science, and CINAHL, identifying English-language studies published between January 1, 2010, and April 30, 2023. Our search included keywords related to cognitive impairment stages, such as dementia, MCI, and normal cognition, as well as EHR-related terms like “electronic health records”. The specific queries for individual databases can be found in Supplementary Table 1. This study does not involve direct experimentation on human or animal subjects. All procedures and analyses comply with the ethical standards of the institution.
Study Selection
We included studies that utilized EHR datasets to investigate the association between potential risk factors and dementia outcomes. The EHR datasets referred to data extracted from EHR systems, not the active EHR systems themselves. We excluded review articles without original data, non-English articles, studies focused on patients with preexisting cognitive impairment at baseline, those with small sample size (n < 100) or short follow-up times (< 1 year), non-epidemiology studies (e.g., algorithm evaluation), and studies of low-quality with missing or unclear components (e.g., unclear diagnostic criteria for outcomes).
Screening Process
After eliminating duplicates and using automation tools (e.g., classification by the search engine, keyword-based search of the tile and abstract) to exclude articles deemed ineligible, we obtained abstracts from the search results. Two reviewers independently assessed titles and abstracts based on the inclusion and exclusion criteria, resolving disagreements through discussion to reach a consensus. Subsequently, two reviewers independently performed full-text screening, with a senior reviewer addressing any disagreements.
Data Extraction
We extracted articles assessing risk factors for the cognitive decline onset, including MCI, AD, and other dementias. We assessed methodological quality and developed a data extraction schema based on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist for observational studies.^10^ Extracted data included article information (authors and year), objectives, study design (e.g., cohort or case-control), study cohort, sample size, follow-up duration, data sources, explored risk factors, confounding variables, outcomes and measurement, statistical methods, and key findings. Each article underwent independent extraction by two reviewers, with discrepancies resolved through discussions, or consultation with a third reviewer.
Article Classification
After extracting risk factors from articles, we categorized them into major groups, including medical conditions, medical interventions, lifestyle, socioeconomic, psychosocial, and environmental factors. These groups were further subdivided; for example, medical conditions included cardiovascular and metabolic conditions, as well as psychiatric conditions. Some articles covered multiple risk factors, leading to overlap across categories.
RESULTS
Figure 1 shows the PRISMA flow diagram. The initial search yielded 1,593 articles, 565 from PubMed, 538 from Web of Science, and 490 from CINAHL. We removed 496 duplicate articles, where the same article appeared in more than one database. Automated tools marked 74 articles as ineligible, which included 3 case reports, 28 review articles, and 43 articles without abstracts. We also excluded 95 additional records, such as 42 datasets, 25 preprints, 13 authorless articles, 9 patents, 3 genetic studies, 2 books, and 1 thesis. Subsequently, during title and abstract screening, 832 articles were excluded for not meeting the criteria. The remaining 96 articles underwent full-text screening. After excluding additional 16 articles, e.g., those not primarily using EHR data, or having small sample size, 80 articles remained for final analysis. A detailed list of these articles and extracted data is available in Supplementary Table 2 in the supplement.
Research Trend Over Time
Figure 2 illustrates the distribution of analyzed articles by publication year. It shows a notable increase in publications related to our topic over the past decade, indicating a growing trend in using EHR data to examine ADRD risk factors. Although our search spanned from 2010 to 2023, all included articles were published after 2014. More than one-quarter of the articles (n = 22, 27.5%) were published in 2022. It is important to note that our search was conducted up to April 2023; therefore, the total for that year does not reflect the full annual count.
Study Design
Out of the 80 articles reviewed, 77(96.3%) were longitudinal studies retrospectively conducted, comprising 70 cohort studies, six case-control studies, and one randomized control trial. Longitudinal studies had a median EHR duration of 16 years, calculated from the initial year to the final year of the EHR records utilized, regardless of individual patient follow-up time. Among these, 16 studies (20%) had EHR data spanning under 10 years, 39 studies (48.8%) ranged between 10 and 20 years, and 22 studies (27.5%) had data duration exceeding 20 years.
Methods for Statistical Analyses
In the statistical analysis, 76.3% of the studies (n = 61) predominantly used survival analysis to model and identify various risk or protective factors. Among these, most (n = 54, 88.5%) opted for the Cox proportional-hazards regression model,^11^ while some (n = 13, 21%) used the Fine-Gray model,^12^ often in combination. The Fine-Gray model was chosen for its ability to handle competing risks like death. Other statistical analysis methods included logistic regression, Chi-squared test, and analysis of variance (ANOVA).
EHR Datasets and Sources
The included articles utilized diverse datasets to examine ADRD risk factors. These datasets were derived either directly from EHR systems, such as Veterans Health Administration (VHA), or linked to EHR databases to incorporate specific variables or outcomes from external databases, such as the UK Biobank. Categorized by geographical location, almost half of the studies (46.3%, n = 37) used data from EHR systems within the United States (US), while 40% (n = 32) utilized datasets from the United Kingdom (UK). Additional countries represented in this review included Australia (n = 3),^13–15^ China (n = 3),^16–18^ Denmark (n = 3),^19–21^ the Netherlands (n = 3),^20–22^ Taiwan (n = 2),^23, 24^ Canada (n = 2),^25, 26^ and Sweden (n = 2).^27, 28^
In the US, the most frequently used EHR dataset was derived from the Kaiser Permanente’s EHR (11 studies), followed by the VHA (6 studies). The remaining 21 articles used databases from other US healthcare systems and commercial sources like TriNetX,^29–31^ IBM Explorys,^32^ and Optum.^33^ For studies utilizing UK datasets, the Whitehall II study^34–39^ (n = 8) and UK biobank^40–46^ (n = 7) cohorts were the most frequently used, linked to various UK EHR datasets, including the Hospital Episode Statistics,^47^ Scottish Morbidity Record data,^48^ and Patient Episode Database.^40, 41 46^ Other frequently used databases in the UK studies included the Clinical Practice Research Datalink (n = 6)^49–53^ and the Health Improvement Network (THIN) (n = 4).^21, 54, 55^
EHR Dataset Sample Size
The studies employed datasets with varying sample sizes, from hundred to millions of patients. Only one study had fewer than 1000 patients.^25^ Twenty-six (32.5%) studies had datasets ranging from 1,000 to 10,000 patients; 46 (57.5%) studies had datasets with 10,000 to one million patients. Seven (8.8%) studies used datasets with over one million patients.
Outcomes and Measurements
Most studies (n = 67) examined multiple dementia subtypes, including AD, vascular dementia, Lewy body dementia (LBD), frontotemporal dementia (FTD), and mixed dementia. AD was consistently included in all studies, with nine studies exclusively focused on AD. The majority of these studies defined outcomes using standard coding systems, such as ICD codes (81.3%, n = 65), Read codes (11.3%, n = 9), and SNOMED-CT (2.5%, n = 2). Additionally, some studies employed alternative methods, including prescriptions for dementia medications,^16, 18, 26, 33^ cognitive function tests,^25, 42, 56^ referencing the Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition),^22, 57, 58^ screening interviews,^14, 56^ and neuroimaging.^42^
Risk and Protective Factors
We summarized the analyzed risk factors in the reviewed articles, categorizing medical conditions and interventions into broad disease categories (Table 1). Other risk factors were classified into lifestyle, socioeconomic, environmental, and miscellaneous categories (Table 2).
Medical conditions
Out of the 80 articles reviewed, 39 (48.8%) explored the interplay between medical conditions and ADRD. Of them, 15 articles focused on cardiovascular and metabolic conditions, 11 on infections, inflammatory and immune-related conditions, 7 on neurological/ophthalmological conditions, 5 on physical function and frailty, 4 on psychiatric conditions, 2 on cancer,^45, 59^ and 4 on other risk factors like kidney disease,^23, 36^ osteoarthritis,^26^ and hip fracture.^17^
Cardiovascular and metabolic conditions: A significant finding in our analysis is the association of cardiovascular/metabolic conditions and ADRD risk. Diabetes, examined in several studies,^23, 26, 35, 60, 61^ and its common complication, hypoglycemia, identified as a risk factor,^16, 49, 56, 62^ are noteworthy. Extensive research using EHR data has explored blood pressure’s relationship with ADRD. Hypertension,^23, 61 63, 64^ hypotension^20^ and blood pressure variability,^65^ all contribute to increased ADRD risk. Additional risk factors include coronary artery disease,^23^ stroke,^23^ and hyperlipidemia.^23^ Obesity’s impact is mixed: it has been identified as a risk factor in one study,^53^ suggested to have a potential protective effect in another,^40^ and found to have no impact in a third,^26, 53^ although this may be influenced by factors like age at assessment and frailty in underweight individuals.
Infections, inflammatory and immune-related conditions
Hiv^66, 67^ e. coli,^68^ and Covid-19^29^ have been identified as risk factors for ADRD. However, several common infections–such as sepsis, pneumonia, lower respiratory tract infections, urinary tract infections, and skin and soft tissue infections–did not exhibit increased ADRD risk.^42^ Regarding herpes viruses, one study observed a slightly decreased risk of dementia among individuals with symptomatic Herpes Simplex Virus 1 (HSV-1) infections untreated by antivirals and a more pronounced 25% decrease in those treated with antivirals.^69^ Another study detected a minor protective link between Herpes Zoster (HZ) and dementia, particularly in frail individuals and females, and only for mixed or unspecified dementia.^50^ Additionally, the inflammatory/autoimmune disease cluster was associated with elevated ADRD risk,^45^ including inflammatory bowel disease was also found as a risk factor.^32^ Both high urate^40^ and gout^55^ were associated with a decreased risk for ADRD, possibly due to uric acid’s antioxidant effects, which align with observations related to obesity.^22^
Psychiatric conditions
The interplay between depression and ADRD remains unclear. While some view depression as a symptom, others see it as a precursor. In our final analysis, three articles explored the link, and all identified a depression as a risk factor for ADRD.^14, 26, 70^ Additionally, psychotic disorders have been reported as a risk factor.^15^
Neurological/ophthalmological conditions
The eyes and brain also form crucial nodes in the ADRD risk network. Retinal vascular occlusion is linked to increased ADRD risk.^58^ Visual impairment, assessed by visual acuity, has also been linked to an elevated ADRD risk,^41^ although one study did not find this connection.^71^ The impact of diabetic retinopathy, a complication from diabetes, remains ambiguous, with one study indicating increased risk^72^ and another observed no effect.^73^ Traumatic brain injury^74^ and epilepsy^75^ are identified as risk factors.
Physical function and frailty
Frailty metrics are factors to consider in ADRD risk assessment. Underweight is identified as a risk factor for ADRD,^20, 26, 40^ although one study had a different finding.^53^ The protective effect of obesity, sometimes observed, could be related to avoiding the increased risk associated with being underweight.^22^ Low physical function, measured by grip strength and the Short Physical Performance Battery (SPPB), is linked to increased risk.^25^ However, another study did not find an association between physical inactivity or unintentional low caloric intake and ADRD risk.
Other medical conditions
Several studies have investigated a miscellany of medical conditions in related to ADRD. Cancer is noteworthy, with one study showing an elevated ADRD risk in the cancer disease cluster.^45^ In contrast, another study found that malignant melanoma and non-melanoma skin cancers were associated with a reduced ADRD risk, suggesting a protective effect.^59^ Additionally, kidney disease,^23, 36^ hip fracture,^17^ and osteoarthritis^26^ were identified as ADRD risk factors.
Medical interventions
In light of the risk posed by medical conditions to ADRD, researchers have examined various medical interventions to determine if they could mitigate the risk of ADRD. Of the 80 articles assessed, 25 (31.3%) analyzed the association between medical interventions and ADRD. Out of these, 11 were related to cardiovascular and metabolic interventions, 5 to immune, infection, and inflammatory interventions, 4 to psychiatric interventions, 3 to oncology, and 4 to other interventions.
Cardiovascular and metabolic-related interventions
Research has focused on treatments targeting cardiovascular and metabolic conditions to reduce ADRD risk. Medications such as rosuvastatin,^76^ telmisartan,^24^ anticoagulants,^52^ and aspirin,^30^ primarily for cardiovascular health, have proven effective in lowering ADRD risk. In diabetes management, metformin showed no association with incident dementia compared to no initial treatment within the first 6 months post-diagnosis.^77^ However, it presented a mild protective effect compared to sulfonylureas.^33, 78^ Conversely, thiazolidinedione monotherapy and combined therapy with metformin reduced ADRD risk compared to metformin alone.^78^ Sodium-glucose co-transporter 2 inhibitors decreased the risk of dementia in patients with atrial fibrillation and type 2 diabetes.^31^ Among surgical interventions, bariatric surgery increased ADRD risk,^79^ while carotid endarterectomy had no discernible impact.^27^
Immune, infection and inflammatory-related interventions
Immune, infection and inflammatory-related interventions, such as tumor necrosis factor blocking agent,^80^ methotrexate^21^ and antiherpetic medications,^19^ were found to have protective effects against ADRD, while nonsteroidal anti-inflammatory drugs (NSAIDs)^51^ were observed to increase the risk.
Psychiatric-related interventions
Studies had contradictory conclusions on the impact of the selective serotonin reuptake inhibitor (SSRI) antidepressant class on ADRD risk, with one suggesting it as a risk factor^81^ and the other as protective.^76^ Trazodone, another serotonergic antidepressant often used for insomnia, was reported as a neutral factor.^54^
Oncology and other Interventions
Androgen deprivation therapy was linked to an increased risk for ADRD in two studies by the same team.^82, 83^ However, aromatase inhibitor therapy and tamoxifen, used for hormone receptor-positive breast cancer, did not show a difference in dementia risk.^84^
Lifestyle, socioeconomic, psychosocial and environmental factors
EHR data have been utilized to examine the influence of lifestyle, socioeconomic, psychosocial, and environmental factors on ADRD. Out of the reviewed articles, 14 (17.5%) were related to this topic, with 5 articles focused on lifestyles, 5 on socioeconomic factors, 3 on environmental factors, and 2 on psychosocial factors.
Lifestyle
Both smoking^26^ and extensive alcohol consumption^28^ were identified as risk factors for ADRD. Conversely, a healthy lifestyle, including no current smoking, moderate alcohol consumption, regular physical activity, healthy diet, adequate sleep duration, less sedentary behavior, and frequent social contact, exhibited a protective effect against ADRD in patients with type II diabetes.^44^ However, diet alone was not found to be protective against ADRD.^37, 85^
Socioeconomic factors
Higher education showed neuroprotective effects in two of three studies on education and ADRD risk,^86, 87^ although the third study found no significant correlation.^38^ Neighborhood disadvantage^26, 88^ and low occupational position^38^ were associated to a higher risk of ADRD.
Psychosocial factors
Psychosocial factors such as social isolation have been identified as risk factors for ADRD.^46^ In contrast, frequent social contact appears to be a protective factor.^39^ Another metric, the “feeling of loneliness,” was not associated with an increased or decreased risk.^46^
Environmental factors
EHR data was used to analyze several environmental risk factors for ADRD. Being born in high stroke mortality states^89^ and exposure to Agent Orange among veterans^90^ were found to be associated with an increased risk of ADRD. Additionally, lithium levels in drinking water were associated with greater risk of dementia in women.^91^
DISCUSSION
In this scoping review, we conducted comprehensive searches across three major databases to identify studies that utilized EHR data to analyze risk factors associated with cognitive decline. The final selection of 80 articles spans a wide range of risk factors, including medical conditions, interventions, lifestyle, socioeconomic status, psychosocial, and environmental factors. The majority of studied medical conditions were associated with an elevated risk of ADRD, whereas medical interventions addressing these conditions often reduced the ADRD risk. Using large and diverse EHR datasets has enriched the literature on antecedent risk factors for dementia and confirmed findings from smaller sample studies.
Longitudinal EHR data are essential for ADRD research due to the slow progression of the disease. The prolonged latency period between risk factor exposure and clinical symptoms necessitates extended observation to identify early signs and risk factors, facilitating causality assessment.
Utilizing EHR datasets offers numerous benefits for ADRD research. These datasets provide extensive data with a wealth of variables, enabling the exploration of diverse medical conditions and interventions to identify risk and protective factors for cognitive impairment and dementia. These datasets allow simultaneously investigation of multiple potential risk and protective factors while enabling comprehensive adjustments for confounders. Access to large and diverse EHR datasets enhances statistical power,^92^ allowing for the study of rare events and the identification of unique risk profiles and disease trajectories.^59, 90^ These datasets encompass individuals from various backgrounds, facilitating research across different populations and the examination of various disease subtypes and clinical presentation variations. EHR offers rich, detailed clinical data that enable in-depth studies into the clinical aspects and mechanisms of ADRD. Additionally, EHR datasets can confirm and provide unique insights into factors sometimes overlooked or absent in other types of studies.
To facilitate the investigation of risk factors, including socioeconomic aspects, lifestyle, and environmental factors, EHR data are often linked to external datasets using patient identifiers like names, social security numbers, and zip codes. This approach enables the exploration of additional factors and the incorporation of confounding variables from the EHR.
The utilization of EHR data has the potential to help identify new risk factors for ADRD, as well as analyze the traditionally recognized risk factors from a new perspective. Traditional risk factors, such as diabetes,^23, 26^·^35, 60, 61^ hypertension,^23, 61 63, 64^ stroke,^23^ hyperlipidemia,^23^ and traumatic brain injury,^74^ have been confirmed in the studies utilizing EHR data. Additionally, our review presents a list of newly recognized potential risk factors emerging from EHR data analysis. These include but are not limited to, environmental exposures like Agent Orange,^90^ infections such as COVID-19,^29^ and certain surgical procedures previously not associated with ADRD risk. The study by Kim et al,^79^ for example, was the first to find that bariatric surgery increases the risk of ADRD, which contrasts with earlier research that has suggested potential cognitive benefits related to weight loss and metabolic improvement post-surgery. The advent of big data has enabled the identification of these novel risk factors, offering fresh insights into the multifactorial nature of ADRD.
Exploring various populations and EHR datasets reveals inconsistencies in findings on factors like obesity, hypertension, visual impairment, metformin, and underweight, as well as potential conflicts with results from studies not included in this review. The divergent findings underscore the complexity of ADRD risk factors, emphasizing the importance of further research to elucidate these relationships.
Using EHR datasets for ADRD research offers valuable insights but comes with notable limitations. The accuracy and completeness of diagnostic coding in EHRs can vary, impacting the reliability of outcome and exposure classification. Another constraint is the outcome measure heterogeneity and quality in EHR-based studies. Dementia definitions vary, including or excluding subtypes like vascular dementia and LBD, and using different coding systems (ICD, SNOMED CT, READ). Some use cognitive tests with smaller samples, while others rely on ICD codes for larger samples but potentially less specific diagnoses. This diversity in dementia definitions reflects the complexity of diagnosing and classifying cognitive decline and dementia in real-world clinical settings. It affects the identification of cognitive decline risk factors, leading to variability in reported associations. For instance, studies focusing on specific dementia subtypes may reveal unique risk factors that differ from those identified in broader dementia studies. The choice of diagnostic codes and cognitive assessments can also influence the accuracy of dementia identification, thereby affecting the strength and direction of associations between risk factors and cognitive decline. Minimizing variability in outcome measures could substantially enhance the interpretability and comparability of findings in cognitive decline research. Standardizing the criteria for dementia diagnosis across majority of healthcare providers, as opposed to limiting it to a few specialists (dementia experts), could simplify the synthesis of research results and refine the accuracy of associations with risk factor. It could also enhance the detection of subtle or nuanced associations between risk factors and cognitive decline that might be obscured by the current heterogeneity.
EHR-based studies provide valuable insights but do not conclusively establish causality due to the potential influence of uncontrolled confounding variables. Investigations into the link between depression and dementia highlight this challenge. Studies related to diabetes management often fail to distinguish between the cognitive effects of specific diabetes medications and those resulting from overall glycemic control, leaving it unclear if observed benefits stem from particular drugs or general blood sugar management. Additionally, trazodone was found as a risk factor for ADRD; however, the study suggests that the higher incidence of dementia observed among trazodone users might not imply a direct causal relationship but could instead reflect the medication’s use in managing symptoms common in the early stages of cognitive impairment.^54^ Therefore, when interpreting results from those included observational studies, readers should be cautious not to presume a direct causal relationship between the risk factors studied and the outcomes.
Furthermore, the inherent biases in observational studies, including potential confounding, selection bias, and information bias, continue to be pervasive issues. Although most included studies attempted to adjust for known confounders, the possibility of residual confounding cannot be dismissed. Adjusting for confounders in survival models may not be sufficient, especially with numerous confounders or significant covariate overlap between the groups being compared. This can lead to issues such as multicollinearity and overfitting. Advanced statistical methods, including propensity score matching (PSM) and inverse probability weighting (IPW), are often used to reduce bias in the estimation of exposure or treatment effects. Nevertheless, EHR-based studies are not equivalent to randomized controlled clinical trials, the gold standard for establishing causality. Researchers should also consider the context of EHR data collection, including demographic and clinical characteristics of study populations. Variations in healthcare access and utilization across different populations could influence the observed associations. Notably, crucial data, such as information on deaths, may be absent from the EHRs. While some studies have cross-referenced EHR data with external databases to create more comprehensive datasets, not all have followed this approach.
Additional methodological concerns arise in statistical analysis of the included studies. Long follow-up times introduce competing events like death, potentially impacting the event of interest (e.g., AD diagnosis). The widely used Cox model is not suitable for handling competing risks properly, as it treats them as censored, potentially yielding biased results when the assumption of independent censoring is violated. In contrast, the Fine-Gray model estimates covariate effects on the sub-distribution hazard, offering insights into risk and protective factors’ relationships with the event of interest while considering competing risks. Therefore, it is crucial to evaluate study-specific quality indicators, like adherence to the STROBE guidelines, validated outcome measures, and statistical analyses robustness, to prevent overinterpretation of the findings.
Lastly, geographic and demographic constraints exist. Despite the extensive data in EHR systems, research is often localized to specific healthcare systems or geographic locales, limiting generalizability. For instance, unlike UK, the US and other countries appear to underutilize national-level EHR datasets. Despite assess to longitudinal EHR datasets across various healthcare systems and regions, research is often confined to specific EHRs. The VHA dataset, though national, predominantly represent male individuals, poses a demographic limitation. Expanding the use of such comprehensive data sources can provide a more representative sample and enhance research generalizability. The lack of research utilizing large, diverse, national EHR datasets underscore the need for future studies on dementia risk through such resources.
The underdiagnosis of MCI and dementia presents a significant challenge in ADRD research, particularly during early stages. Reliance on EHRs for diagnosis can inadvertently contribute to underreporting, affecting the accuracy of prevalence and incidence rates in the literature. This skewing, due to EHR-based data extraction, might underestimate the true burden of these conditions. Consequently, such underestimation can impact systematic or scoping review findings, altering our understanding of risk factors, disease progression, and intervention effectiveness. Interestingly, individuals frequently interacting with psychiatric services for other conditions are more likely to have cognitive impairment noted in their EHRs compared to those without psychiatric conditions. Therefore, studies that consider psychiatric conditions as risk factors for ADRD particularly require careful interpretation.
Future directions
The analysis of the articles suggests several avenues for future investigation using EHR data.
Medical interventions: The impact of medical treatments on reducing cognitive decline in the context of various medical conditions remains unclear. There is a lack of research on pharmacological and surgical effects compared to studies on medical conditions and ADRD. Future research should prioritize studying the relationship between medical interventions and cognitive decline more broadly.Explore overlooked factors: investigate additional risk or protective factors, like genetic markers (e.g., apolipoprotein E, presenilin 1 and 2, and amyloid precursor protein), environmental toxins (e.g., lead, pesticides),^93^ mild traumatic brain injury,^94^ endocrine factors (such as hypothyroidism), sleep disturbance (like sleep apnea or chronic sleep deprivation),^7, 95^ bilingualism,^96^ vitamin and nutritional deficiencies,^97^ and the microbiome (e.g., gut microbiome).^98^Clinical notes and AI: Almost all the reviewed articles have used data from structured fields of the EHR. Certain conditions and symptoms (e.g., hearing loss, sleep disturbances) that are not consistently captured in structured EHR data may require the examination of clinical notes to identify them, often necessitating AI and natural language processing.Early cognitive decline: While the existing literature primary focuses on dementia or AD, fewer studies address the early onset of AD and the initial stages of cognitive decline, such as mild cognitive impairment and subjective cognitive decline.Diversify study populations: Most EHR-based studies have focused on populations with well-defined medical conditions like diabetes, hypertension, cancer, and HIV. To advance research, it’s essential to include a broader range of specific groups, such as sexual and gender minorities,^99, 100^ indigenous populations, those resilient to cognitive decline, and various psychiatric cohorts.Database integration: Integrating diverse EHR database across institutions and locations, like the UK’s national datasets, can expand study populations and enhance research generalizability, which is currently underutilized in the US and other countries.Data linkage: EHRs lack some data and require linkage with other datasets,^101^ including insurance claims, genetics, socioeconomic status,^102^ lifestyle, crime, and environmental factors (e.g., air pollution, wildfires, climate change, toxic chemicals).
Limitations
This review has several limitations to acknowledge. First, our search was constrained to three databases, potentially missing relevant studies in other sources. Second, our search term, focused on titles and abstracts, might have overlooked articles using different terminology or mentioning EHR components (e.g., clinical notes) in the methods section. Third, we didn’t perform a bias assessment for included observational studies, which is important considering biases in EHR data collection and outcome measures. Fourth, this review doesn’t aim to provide a comprehensive overview of ADRD risk factors; instead, it focuses on what has been studied using EHR data. Finally, we refrained from conducting a meta-analysis due to variations in adjusted confounders among studies, complicating cross-study comparisons.
CONCLUSION
EHR data, with its rich and diverse longitudinal real-world information, provides substantial insights into the medical conditions, interventions, lifestyle, socioeconomic, and environmental factors associated with ADRD risk. Looking ahead, research should focus on diversifying study populations and integrating EHR data across geographical locations and with non-EHR datasets. There is also a need to enhance the extraction of information from unstructured text to explore a broader range of risk factors for ADRD.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Jack CR Jr, Bennett DA, Blennow K, NIA-AA research framework: toward a biological definition of Alzheimer’s disease. Alzheimer’s & Dementia 2018; 14: 535–562.10.1016/j.jalz.2018.02.018PMC 595862529653606 · doi ↗ · pubmed ↗
- 2Sperling RA, Aisen PS, Beckett LA, Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & dementia 2011; 7: 280–292.10.1016/j.jalz.2011.03.003PMC 322094621514248 · doi ↗ · pubmed ↗
- 3Association As. 2023 Alzheimer’s disease facts and figures. Alzheimer’s & dementia 2023; 19. DOI: 10.1002/alz.13016.36918389 · doi ↗ · pubmed ↗
- 4Veitch DP, Weiner MW, Aisen PS, Using the Alzheimer’s Disease Neuroimaging Initiative to improve early detection, diagnosis, and treatment of Alzheimer’s disease. Alzheimers Dement 2022; 18: 824–857. 2021/09/29. DOI: 10.1002/alz.12422.34581485 PMC 9158456 · doi ↗ · pubmed ↗
- 5Dagley A, La Point M, Huijbers W, Harvard aging brain study: dataset and accessibility. Neuroimage 2017; 144: 255–258.25843019 10.1016/j.neuroimage.2015.03.069PMC 4592689 · doi ↗ · pubmed ↗
- 6Wolters FJ, Segufa RA, Darweesh SKL, Coronary heart disease, heart failure, and the risk of dementia: A systematic review and meta-analysis. Alzheimers Dement 2018; 14: 1493–1504. 2018/03/02. DOI: 10.1016/j.jalz.2018.01.007.29494808 · doi ↗ · pubmed ↗
- 7Shi L, Chen SJ, Ma MY, Sleep disturbances increase the risk of dementia: A systematic review and meta-analysis. Sleep Med Rev 2018; 40: 4–16. 2017/09/12. DOI: 10.1016/j.smrv.2017.06.010.28890168 · doi ↗ · pubmed ↗
- 8Kuiper JS, Zuidersma M, Oude Voshaar RC, Social relationships and risk of dementia: A systematic review and meta-analysis of longitudinal cohort studies. Ageing Res Rev 2015; 22: 39–57. 2015/05/10. DOI: 10.1016/j.arr.2015.04.006.25956016 · doi ↗ · pubmed ↗
