Region-based analysis with functional annotation identifies genes associated with cognitive function in South Asians from India

Hasan Abu-Amara; Wei Zhao; Zheng Li; Yuk Yee Leung; Gerard D. Schellenberg; Li-San Wang; Priya Moorjani; A.B. Dey; Sharmistha Dey; Xiang Zhou; Alden L. Gross; Jinkook Lee; Sharon L.R. Kardia; Jennifer A. Smith

PMC · DOI:10.21203/rs.3.rs-4712660/v1·August 10, 2024

Region-based analysis with functional annotation identifies genes associated with cognitive function in South Asians from India

Hasan Abu-Amara, Wei Zhao, Zheng Li, Yuk Yee Leung, Gerard D. Schellenberg, Li-San Wang, Priya Moorjani, A.B. Dey, Sharmistha Dey, Xiang Zhou, Alden L. Gross, Jinkook Lee, Sharon L.R. Kardia, Jennifer A. Smith

PDF

Open Access

TL;DR

This study identifies genes linked to cognitive function in South Asians from India, highlighting potential genetic risk factors for dementia in this under-researched population.

Contribution

The study applies region-based analysis with functional annotation to identify novel gene associations with cognitive function in South Asians.

Findings

01

APOE, PICALM, and TSPOAP1 were associated with cognitive function in South Asians.

02

rs779406084 is a rare missense variant more common in South Asians than in Europeans.

03

Brain-specific promoter/enhancer variants did not show significant associations.

Abstract

The prevalence of dementia among South Asians across India is approximately 7.4% in those 60 years and older, yet little is known about genetic risk factors for dementia in this population. Most known risk loci for Alzheimer’s disease (AD) have been identified from studies conducted in European Ancestry (EA) but are unknown in South Asians. Using whole-genome sequence data from 2680 participants from the Diagnostic Assessment of Dementia for the Longitudinal Aging Study of India (LASI-DAD), we performed a gene-based analysis of 84 genes previously associated with AD in EA. We investigated associations with the Hindi Mental State Examination (HMSE) score and factor scores for general cognitive function and five cognitive domains. For each gene, we examined missense/loss-of-function (LoF) variants and brain-specific promoter/enhancer variants, separately, both with and without…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes3

APOE PICALM TSPOAP1

Proteins3

Diseases3

dementia Alzheimer’s disease AD

Mutations3

rs779406084 rs429358 rs9913145

Figures3

Click any figure to enlarge with its caption.

Keywords

Geneticsgenomicsregion-based analysiscognitive functionwhole genome sequencingSouth Asian

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic Associations and Epidemiology · Genomics and Rare Diseases · Bioinformatics and Genomic Networks

Full text

Introduction

Dementia is a group of neurological disorders characterized by cognitive impairment. In 2019, the estimated global economic costs of dementia was about $1.3 trillion USD [1]. The public health burden for dementia is borne disproportionately by lower- and middle-income countries, which harbor approximately 61% of affected individuals [1]. Over 50 million people worldwide have Alzheimer’s Disease (AD), the most prevalent form of dementia [2], and this number is projected to reach over 150 million by 2050 [3]. Cognitive decline, even without dementia, increases the need for costly personal and medical care.

While extensive research has focused on risk factors for later-life cognitive decline and dementia, there are still remaining questions regarding its etiology. For example, AD is a result of the accumulation of amyloid beta plaques and neurofibrillary tangles in the brain [4]. Amyloid beta and tau protein metabolism may be influenced by genetic variants that alter chemical properties or abundance of relevant proteins [5]. Heritability estimates for AD are high (60–80%) [6], indicating that the identification of AD-associated variants is critical for a deeper etiological understanding. Heritability of cognitive function is also relatively high across the life course (40–80%) [7]. However, the vast majority of genetic loci for measures of cognitive function and dementia were identified from studies conducted in European Ancestry (EA) participants. A deeper exploration of the genetic factors underlying late-life cognition and dementia in non-EA populations is now needed to both identify population-specific risk variants across the genome and gauge the relative importance of previously-identified loci.

With over 1.4 billion people, India is the second most populous country in the world, and the public health burden of dementia is dramatically increasing as the population both grows and ages. The prevalence of dementia among South Asians living in India varies by geographic location and sociodemographic characteristics (e.g., rural vs. urban), and is approximately 7.4% among individuals 60 years and older [8]. While studies have indicated that older age, lower education, diabetes, obesity, and other factors increase risk of dementia in India [9], there has been little research on genetic risk factors. Therefore, it is unclear whether the same genes that have been associated with dementia and cognitive decline in EA have a similar influence on dementia risk in South Asians. Likewise, there may be causal risk variants in known AD genes, or in other genes, that are unique to India.

Detection of rare variants associated with measures of cognitive function in non-EA populations may be difficult due the combination of increased genetic diversity and smaller sample sizes available for genetic research, which leads to a loss of power. Statistical power can be increased by grouping together variants within a gene or genomic region that have the same functional annotation, such as those that alter protein structure (e.g. loss of function (LoF) or missense variants) or those in regulatory elements (e.g. gene promoter or enhancer regions), which helps increase the likelihood of selecting probable causal variants [10].

In this study, we examined whether 84 genes previously associated with AD in EA are also associated with seven measures of cognitive function in 2,680 participants from the Diagnostic Assessment of Dementia for Longitudinal Aging Study of India (LASI-DAD), a nationally-representative study that includes diverse ethno-linguistic and geographic groups. From whole genome sequence (WGS) data, we selected missense/loss-of-function (LoF) single-nucleotide variants (SNVs) and brain-specific promoter and enhancer SNVs within each gene. This work will help elucidate genetic variants associated with cognitive function in South Asians across India, which may play an important role in risk stratification and help guide intervention and treatment plans for those at risk for dementia in India.

Methods

Study population

LASI [11] is a nationally representative cohort of Indian adults who are at least 45 years of age. LASI-DAD, an ancillary study investigating risk factors for dementia, enrolled 4,096 LASI participants from 18 states and union territories across India. Participants were selected by two-stage stratified random sampling across states/territories in India and with respect to cognitive impairment risk, with sampling strategy described elsewhere [12, 13]. Briefly, participants were classified as low risk or high risk for cognitive impairment based on their performance on core cognitive tests conducted in the larger LASI cohort, or on proxy reports if the participant did not complete those tests. Then, an approximately equal number of respondents in the high risk and low risk strata were randomly drawn from each state/territory with a target sample size proportionate to the population size. Participants underwent neurocognitive testing with tests logically and culturally adapted from tests present in the Harmonized Cognitive Assessment Protocol (HCAP) [12], informational interviews, and a blood draw to extract DNA for whole genome sequencing. A total of 2,680 participants with complete genotype and cognition data were included in the analysis.

Whole-genome sequence data

Whole genome sequencing (WGS) at an average read depth of 30x was performed by MedGenome, Inc (Bangalore, India) using DNA extracted from blood samples from 2,762 LASI-DAD participants. Genotype calling and quality control (QC) were performed at the Genome Center for Alzheimer’s Disease (GCAD) at the University of Pennsylvania [14]. Briefly, sample-level quality control included checks for low coverage, sample contamination, sex discrepancies, concordance with previous genotype data, and duplicates [14]. After excluding control samples and samples with low quality and/or unresolved identity, a total of 2,680 samples were retained in the analysis. At the genotype level, each genotype was evaluated and set to missing if read depth was less than 10 (DP < 10) or genotype quality score was less than 20 (GQ < 20). At the variant level, a variant was excluded if it was monomorphic, was above the 99.8% Variant Quality Score Recalibration (VQSR) Tranche (the quality score was beyond the range that contains 99.8% of true variants), had a call rate ≤ 80%, or had an average mean depth > 500 reads. We further removed variants that were in low complexity regions identified with the mdust program.[15] After quality control and filtering, we retained a total of 71,109,961 autosomal bi-allelic variants that include 66,204,161 single nucleotide polymorphisms (SNPs) and 4,905,800 indels.

Principal component analysis and genetic relationship matrix

We estimated genetic principal components (PCs) and the genetic relationship matrix (GRM) in GENESIS (version 2.26.0) [16, 17]. For this analysis, we included variants with minor allele frequency (MAF) ≥ 5% and pruned for LD (r^2^ = 0.1, window size = 500kb) to select independent variants. Kinship coefficients were first estimated using “snpgdsIBDKING” function. Subsequently, genetic principal components (PCs) were calculated using “PCair”, which estimates population structure while accounting for cryptic relatedness in the samples. Specifically, PCs were first estimated in a set of unrelated individuals (kinship cutoff = 0.044) to obtain robust variant weights, which were then used to project PCs in the rest of the sample. Following this, the genetic relationship matrix (GRM) was estimated using “PCrelate” by simultaneously adjusting the top 2 PCs to avoid potential confounding from population structure.

Measures of Cognitive Function

We analyzed seven measures of cognitive function including five cognitive domains (memory, orientation, language/fluency, executive function, and visuospatial function), general cognitive function constructed from the five cognitive domain scores, and the Hindi Mental State Exam (HMSE) score. The HMSE is a version of the Mini Mental State Exam dementia screener translated into Hindi. It is designed to be administered to participants from a population where a significant proportion of individuals are illiterate, and is scored as the sum of 22 items which totals to an integer between zero and 30, with a higher score indicating more cognitive intactness [18]. The five cognitive domain scores are factor scores of a collection of tests assigned to a broad domain of cognition as informed by Cattell-Horn-Carroll (CHC) theory of human cognitive abilities, with composite weights and tests described elsewhere [19]. The cognitive domain and general cognitive function scores were each estimated using item-response theory (IRT) and were normalized to a Gaussian distribution with mean of zero and variance of one in the full LASI-DAD subcohort [19].

Gene selection

We selected a total of 84 genes from the two largest genome-wide association studies (GWAS) for AD in EA (S1 Table) [20, 21], as well as TOMM40 and APOC1 which are proximal to APOE and are known to be associated with Alzheimer’s disease [22, 23]. Briefly, Bellenguez et al. [20] performed a two-stage GWAS of 10 case-control studies across Europe. Stage 1 included 39,106 clinically diagnosed AD cases, 46,828 proxy AD and dementia-related disorder (ADD) cases, and 401,577 controls. Stage 2 included 25,392 AD cases and 276,086 controls. Bellenguez et al. identified 75 risk loci, of which 42 were novel. The authors then conducted pathway analysis and designed a gene prioritization algorithm to stratify loci according to their likelihood of having a causal effect on ADD risk. For this study, we selected a total of 73 genes from Bellenguez et al. [20] including those that were labeled as known loci or those that were classified as having the highest likelihood of a causal effect on ADD (tier 1). Wightman et al. [21] conducted a meta-analysis of 13 studies of EA participants across the United States and Europe. The total sample size was 1,126,563 individuals, which included 90,338 AD cases (46,613 were proxy cases) and 1,036,225 controls (318,246 were proxy controls). For this study, we selected 45 genes from independent 38 loci that reached genome-wide significance (p < 5×10^− 8^) in Wightman et al. There was an overlap of 36 genes between the two AD GWAS. See Supplemental Methods for additional details on selecting genes within the identified AD loci (S1 Methods).

Gene boundaries were defined by GRCh38.p14 in NCBI Gene, which uses NCBI RefSeq to annotate gene positions. We selected all SNVs within the gene start and stop positions for the missense/LoF analysis. For the brain-specific promoter/enhancer analysis, we selected SNVs within a ± 20 kb buffer of the gene’s transcription start site. Only genes with at least two missense/LoF or promoter/enhancer SNVs (defined below) within the region were included in the final analysis.

Definition of missense/LoF and promoter/enhancer SNVs

We followed similar definitions of missense/LoF SNVs, promoter SNVs, and enhancer SNVs as Li et al.[24] Briefly, we used the Variant Effect Predictor (VEP) [25] and LOFTEE [26] with GENCODE as the transcript annotation reference to identify missense and LoF SNVs, respectively. We additionally classified missense and LoF variants based on the confidence of their predicted function. LoF SNVs were annotated as either high confidence or low confidence using LOFTEE, and missense SNVs were assigned a REVEL score. REVEL scores are generated through an ensemble method to measure the pathogenicity of a missense SNV [27], with higher scores indicating greater likelihood of causing diseases. Missense SNVs with REVEL score > 0.5 are considered to have high confidence. Next, we used the WGS Annotator (WGSA) v0.95 pipeline [28] to define promoter SNVs as those that fell within ± 5kb of a gene’s transcription start site with at least one H3K4me3 annotation for brain tissues (E067-E074, E081, E082) from the ENCODE database. We defined enhancer SNVs as those that fell within ± 20kb of the gene’s transcription start site and overlapped with an enhancer defined by EnhancerFinder in the brain.

Annotation selection

For both the missense/LoF and promoter/enhancer analyses, we included all SNVs that had no missing annotation weights, regardless of minor allele frequency (S2 Methods). Annotation weights were retrieved using WGSA v0.95 [28]. We selected a subset of annotations similar to Li et al. [24], including those that predicted deleteriousness, predicted impact on the protein, and that summarized evolutionary conservation. For missense/LoF SNVs, we used CADD_raw_rankscore, a measure of variant deleteriousness combining multiple genomic features of each variant [29]; GERP_RS_rankscore, a measure of variant conservation [30]; Eigen.phred, a measure of variant deleteriousness using an unsupervised learning method [31]; and fathmm.MKL_coding_rankscore, a score from a machine learning method incorporating other annotations to predict deleteriousness of the variant from coding variants [32]. For promoter/enhancer SNVs, we used CADD_raw_rankscore [29], GERP_RS_rankscore [30], Eigen.PC.phred [31], fathmm.MKL_non.coding_rankscore [32], and GenoCanyon_rankscore, a measure of variant conservation [33]. Genome-wide ranks of the associated annotation scores were used to generate the Phred scores (i.e., the logarithmically transformed annotation score percentiles) required for STAAR analysis.

Statistical methods

All analyses were conducted in R (ver. 4.2.0). WGS data were converted from VCF files to SeqArray GDS format using SNPRelate [34] and SeqArray R packages. We then used the variant-Set Test for Association using Annotation infoRmation (STAAR) v0.9.6.1 to perform gene-based analysis using functional annotations for missense/LoF (including all variants except low-confidence LoF), missense/LoF (high confidence variants only), and promoter/enhancer regions separately [36]. In STAAR, linear mixed models were used to test each gene region for association with each of the seven measures of cognitive function separately, both with and without annotation weights. Model 1 adjusted for age, sex, state or union territory, and the first ten principal components of global ancestry. Model 2 additionally adjusted for educational attainment, rural or urban residence, and literacy status (yes/no). Each model incorporated a genetic relatedness matrix to account for relatedness between subjects, and geographic state or union territory was used to define heterogeneous variances within the linear mixed model.

For each gene, both with and without including annotation weights for the SNVs, we examined the STAAR p-value which is calculated from modified SKAT-(1,1), SKAT-(1,25), Burden-(1,1), Burden-(1,25), ACAT-V(1,1) and ACAT-V(1,25) tests. For analyses with and without annotation weights, separately, a Benjamini-Hochberg FDR q < 0.1 was used to declare significance.

For each gene region that was associated with a measure of cognitive function at FDR q < 0,1, we next performed a single variant analysis to identify the variants most strongly associated using a score test in STAAR. The same models from the gene-based analysis were used for single variant analysis. For the SNV with the lowest p-value within each identified gene, we compared the allele frequency in LASI-DAD to that found in EA samples registered in gnomAD v3 [37] to examine whether risk alleles were enriched in LASI-DAD.

Results

The LASI-DAD analytic sample had a mean age of 69.6 (SD = 7.3) years (Table 1). The majority of participants could not read or write (56.4%), lived in rural areas (63.3%), and had less than lower secondary education (75%). Mean HMSE score was 22.7 (SD = 5.4) (S2 Table).

We next characterized the distribution of the annotation weights across missense/LoF SNVs and promoter/enhancer SNVs. The missense/LoF SNVs exhibited relatively little variation for almost all annotations and tended to be high. For these annotations, which were on a scale from 0 to 1, the median score ranged from 0.97 to 0.99 (S3 Table). The promoter/enhancer SNVs showed greater variation across annotation weights and tended to be lower (S4 Table). The Eigen-Phred and Eigen-PC-Phred rank scores had more variation and relatively low weights for missense variants, but low variation and higher weights for promoter/enhancer variants, due to their calculation with different training data for each functional class of variant (S Methods).

Missense/loss-of-function (LoF) analysis

Of the 84 genes selected for analysis, 79 had at least two missense/LoF SNVs with complete annotations, and the median number of missense/LoF SNVs across the genes was 21 (Table 2). In Model 1, 16 genes were nominally associated with at least one measure of cognitive function (p < 0.05, S5 Table), with 3 genes associated at FDR q < 0.1 in the analysis without annotation weights (Table 3). Specifically, APOE was associated with HMSE score (FDR q = 0.08), general cognitive function (FDR q = 0.04), executive function (FDR q = 0.07), and orientation (FDR q = 0.07). PICALM was associated with HMSE score (FDR q = 0.08), and TSPOAP1 was associated with executive function (FDR q = 0.07). In Model 2, which additionally adjusts for rural/urban location, literacy, and education, 20 genes were nominally associated with at least one measure of cognitive function (p < 0.05, S6 Table), and PICALM was significantly associated with HMSE score after correction for multiple testing (FDR q = 0.098) in the analysis without annotation weights (Table 3).

As shown in Table 3, the results were similar when we used annotation weights. At FDR q < 0.1, APOE was associated with HMSE score (FDR q = 0.08) and general cognitive function (FDR q = 0.07), and PICALM was associated with HMSE score (FDR q = 0.08). In Model 2, at FDR q < 0.1, PICALM was associated with HMSE score (FDR q = 0.09).

For each gene associated with a cognitive measure at FDR q < 0.1, we examined associations between each SNV within the gene region, without annotation weights, and the cognitive outcome of interest (Table 4). As expected, the most strongly associated variant in Model 1 within APOE for all measures of cognitive function was rs429358 (HMSE p = 2.9×10^− 4^, general cognitive function p = 1.4×10^− 4^, executive function p = 4.1×10^− 4^, orientation p = 2.4×10^− 4^; Fig. 1, S1-S3 Figs), which is the missense variant in exon 4 of APOE that changes cysteine to arginine and differentiates the APOE ε4 allele from ε2 and ε3. Removal of this SNV results in APOE losing significance. For TSPOAP1 in Model 1, the most strongly associated variant with executive function was rs9913145 (Model 1 p = 5.7×10^− 4^), a missense variant in exon 17 that changes glutamine to arginine (Fig. 2). This variant had a MAF of 0.15 in LASI-DAD, and a MAF of 0.12 in EA samples in gnomAD, which indicates that the minor allele is relatively common both in LASI-DAD and in EA populations. In PICALM in Model 1 and Model 2, the most strongly associated SNV with HMSE was rs779406084 (Model 1 p = 4.2×10^− 4^, Model 2 p = 1.6×10^− 4^), a missense variant in exon 19 that changes threonine to methionine (Fig. 3, S4 Fig). Rs779406084 was in high LD with all missense/LoF SNVs in PICALM (|D’|=1) but was not correlated with any other missense/LoF SNV in the gene (r^2^ < < 0.2) including with the one other missense/LoF SNV with p < 0.05. This SNV has a CADD score of 24, indicating that it is in the top 0.4th percentile of all deleterious SNVs. It also has a MAF of 7.5×10^− 4^ in LASI-DAD. While very rare, this variant occurs more often in LASI-DAD compared to EA samples in gnomAD (EA MAF = 1.5×10^− 5^).

High-confidence missense/LoF analysis

Of the 84 genes analyzed, 51 had at least two high-confidence LoF or missense SNVs with REVEL > 0.5. In Model 1, 8 genes were nominally associated with at least one measure of cognitive function (p < 0.05, S7 Table), but none were significant after correction for multiple testing (FDR q > 0.1). In Model 2, 10 genes were also nominally associated with at least one measure of cognitive function (p < 0.05, S8 Table) with four genes overlapping from Model 1 (ABI3, APOE, DGKQ, and INPP5D), but none were significant after correction for multiple testing (FDR q > 0.1). For both analyses, the results did not change substantively when annotation weights were incorporated.

Promoter/enhancer analysis

Of the 84 genes analyzed, 77 had at least two brain-specific promoter or enhancer SNVs with complete annotation weights, and the median number of brain-specific promoter/enhancer SNVs across the genes was 93 (Table 2). In Models 1 and 2, 21 and 22 genes, respectively, were nominally associated with at least one measure of cognitive function without annotation weights (p < 0.05), but none were associated after multiple testing correction (all FDR q > 0.1, S9 and S10 Tables). When we incorporated annotation weights, 18 genes in Model 1 and 22 genes in Model 2 were nominally associated with at least one measure of cognitive function, but again none were associated after multiple testing correction (all FDR q > 0.1, S9 and S10 Tables), with 18 and 20 genes being nominally associated with at least one measure of cognitive function with and without annotation weights, respectively. In Model 1, USP6NL, INPP5D, and KAT8 were no longer nominally associated with any measure of cognitive function after incorporation of annotation weights. In Model 2, EGFR and APOE were no longer nominally associated with any measure of cognitive function after incorporation of annotation weights, but APOC1 and SORT1 became nominally associated.

Discussion

We performed a gene-based analysis examining the association between missense/LoF and brain-specific enhancer and promoter variants in previously-identified AD-associated genes and seven measures of cognitive function in South Asians across India. Using only missense/LoF variants, three genes were associated with at least one measure of cognitive function after multiple testing correction, including APOE with multiple cognitive measures. However, no genes were associated in the brain-specific promoter and enhancer analysis. The most strongly associated variants were missense SNVs with high predicted deleteriousness. One of the most significantly associated missense variants was very rare, yet appeared to be enriched in LASI-DAD compared to EA samples in public databases.

We found that APOE is significantly associated with HMSE score, general cognitive function, executive function, and orientation in the missense/LoF analysis in Model 1. Apolipoprotein E (APOE) facilitates cholesterol and phospholipid transfer between cells, and complexes with amyloid β proteins in the brain for removal, inhibiting the amyloid β plaque formation necessary for AD onset [38]. APOE alleles confer different risks for Alzheimer’s disease. Relative to the ε3 allele, ε4 is associated with increased risk of Alzheimer’s disease in EA [39], and the ε4/ε4 genotype is also associated with cognitive decline in those with Alzheimer’s disease [40]. In our analytic sample from LASI-DAD, the ε4 allele frequency is estimated to be approximately 10.9%, which is less than reported frequency among EA samples in the US (14%) [41] and while not common is still frequent. The ε4 allele has somewhat different associations with AD risk across races/ethnicities, with ε4 and associated variant effects being stronger in EA populations compared to African-Americans [42]. Rs429358 is used to differentiate between ε3 and ε4 alleles in APOE, and has a CADD score of 16.6, which places it in the top 2nd percentile of deleterious SNVs. Although previous studies with a subset of the current LASI-DAD sample did not find an association between cognitive function and rs429358 [17, 43], this was likely due to smaller sample size and/or less regional variation in the previous studies. Our reported associations with APOE are not surprising as working memory and executive function deficits are often early markers of AD [44].

Phosphatidylinositol binding clathrin assembly protein (PICALM) facilitates endocytosis of APP [45], which is needed to form β-amyloid plaques that lead to AD. PICALM was found to be associated with cognitive function in EA samples [46]. PICALM variants identified in EA have had mixed associations in East Asian samples, with which the South Asian population of India shares ancestry [47]. For example, some variants identified as associated with AD in a large meta-analysis of Chinese GWAS near PICALM [48]were not found to be associated in smaller Indian studies [49]. We found that the sentinel SNV in PICALM for HMSE score is rs779406084, a very rare variant with a high CADD score at 24, placing it in the top 0.2th percentile of all deleterious SNVs. To our knowledge, this is the first study that reports an association with rs779406084 and cognitive function. This is likely due to the rarity of this variant in EA samples (MAF = 1.5×10^− 5^) and an association was likely found in our study due to its comparatively higher frequency in LASI-DAD (MAF = 7.5×10^− 4^). Further work is needed to elucidate the specific effects of this variant on the protein and replication in other cohorts is needed.

Translocator protein (TSPO) associated protein 1 (TSPOAP1) regulates calcium channels in nerve synapses [50]. It interacts with the protein TSPO which is involved in inflammation pathways [51]. TSPOAP1 variants were associated with AD in a large transethnic AD GWAS [52]. The sentinel SNV of TSPOAP1 in LASI-DAD was rs9913145, which has a CADD score of 1.12 indicating that it is not strongly deleterious. This variant is slightly more common in LASI-DAD (MAF = 0.15) compared to EA samples (MAF = 0.12). To our knowledge, this variant has not otherwise been reported to have an association with cognitive function or dementia. Given the relatively low CADD score of the variant and the relatively common frequency in EA samples, this variant may tag a haplotype specific to South Asians within TSPOAP1 that is associated with executive function.

We found no associations between brain-specific promoter and enhancer SNVs within the known AD genes and any of the measures of cognitive function in our sample after multiple testing correction. This is likely due to promoter/enhancer SNVs having more subtle effects on AD gene expression compared to the potentially more deleterious effects from missense/LoF SNVs. We also found that annotation weights did not substantively change our analysis results. This may be because the missense/LoF variants had relatively small variance in their annotation weights and tended to be high, resulting in little additional statistical information.

We found many genes that were nominally associated with each measure of cognitive function in the missense/LoF analysis and brain-specific promoter/enhancer analysis. Genes associated with multiple cognitive measures include ADAM17, OTULIN, and ABCA7, which are involved in amyloid-β metabolism or in immune signaling [38, 53, 54]. In both Model 1 and Model 2, ABCA7, ADAM17, APOE, OTULIN, and TSPOAP1 were all at least nominally associated with three or more measures of cognitive function. These genes all play a role in cholesterol and APP metabolism (ABCA7, ADAM17, APOE) or are involved in inflammation pathways (OTULIN, TSPOAP1).

Similar gene-based analysis studies were conducted in EA samples. One large genome-wide gene-based AD study conducted in the UK BioBank on different categories of rare missense/LoF variants found that three gene regions were associated with AD parent proxy cases, including TOMM40/APOE [55]. Notably, detection of these regions depended on resolving variants into categories of high confidence and predicted loss-of-function effects. Another gene-based AD analysis conducted in the ADES-FR study found that protein-truncating rare variants and strictly damaging rare variants in TREM2, ABCA7, and SORL1 were associated with early-onset AD, but not with late onset AD [56]. Given that the previous studies focused on very specific classes of rare variants in EA samples, it is no surprise that these genes were also at least nominally associated in our study.

Although many genes were nominally associated in our analysis, few genes were significant after correction for multiple testing. This could be in part because we selected genes associated with AD, which may have weaker effects on cognitive function changes that precede AD. Further, the genes were identified through GWAS which excels in identifying primarily non-functional common variants which may be correlated with causal variants. In this study, > 95% of our analyzed variants were rare (MAF < 5%). Although we focused on variants more likely to be causal, it is possible that the sentinel SNPs identified in the GWAS were not tagging variants in the functional classes we examined, and that more genes would have reached significance if common, non-functional variation was included in our analysis. Further, the genes identified were in large cohorts of EA. Genetic differences between EA and South Asians, including allele frequency and linkage disequilibrium, could have contributed to the lack of findings. Another explanation is that genetic associations with cognitive function may be attenuated in this population due to more heterogeneity in environmental factors across India, such as sociodemographics, socio-cultural factors, and air pollution, each of which is associated with cognitive function [57–61]. Finally, the tendency toward associations being nominally significant, but not significant after multiple testing correction, may be a result of the smaller sample size.

One limitation of this study is that we examined only two classes of functional annotations. It may be that variants with other functional consequences besides missense/LoF and promoter/enhancer variants could influence associations with cognitive function. Another limitation is that we could not include insertion/deletion variants in this analysis, as annotation weights are not available; however, it is possible that these variants may have more deleterious effects on proteins and their removal may have attenuated signal. Additionally, the LASI-DAD cohort design oversamples LASI participants with higher cognitive impairment risk, which may result in different observed genetic associations with cognitive function compared to studies sampled in other ways [17]. Finally, cognitive measures may have been biased due to administering the tests in many different languages [43]. However, no systematic bias with respect to language has been detected in LASI-DAD [62].

Our study also has several strengths. The prioritization and aggregation of SNVs based on their actual or predicted functional consequences likely increased signal for associations between the genes and cognitive function by focusing on variants that are more likely to have causal effects. Gene-based analysis with functional annotation also more directly links SNVs disease etiology, allowing a greater understanding of the types of variation within these genes that contribute to cognitive function. Additionally, to our knowledge, our study is the first to examine gene-based, rare variant associations with cognitive function in South Asians living in India. Thus, this work addresses an important health disparity in an understudied population [63]. Furthermore, our cohort presented a unique genetic environment to study potentially novel associations with understudied genetic variants due to its large genetic heterogeneity, unique subpopulations, and unique genetic ancestry [43, 47, 64, 65]. Finally, we examined several measures of cognitive function, which allows us to determine which specific cognitive domains are associated with each gene.

In conclusion, we found that three genes (APOE, PICALM, and TSPOAP1) associated with Alzheimer’s disease in EA are also associated with measure of cognitive functions in South Asians living in India, with the association primarily driven by missense/LoF SNVs. Associations were in part driven by rare, deleterious alleles, including a very rare SNV enriched in LASI-DAD compared to EA. Future functional studies are needed to verify and characterize SNVs found within this study.

Bibliography64

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Wimo A, Seeher K, Cataldi R, Cyhlarova E, Dielemann JL, Frisell O, The worldwide costs of dementia in 2019. Alzheimer’s Dement. 2023;n/a. doi:10.1002/alz.12901 PMC 1084263736617519 · doi ↗ · pubmed ↗
2Schneider JA, Arvanitakis Z, Leurgans SE, Bennett DA. The neuropathology of probable Alzheimer disease and mild cognitive impairment. Ann Neurol. 2009;66: 200–208. doi:10.1002/ana.2170619743450 PMC 2812870 · doi ↗ · pubmed ↗
3Breijyeh Z, Karaman R. Comprehensive Review on Alzheimer’s Disease: Causes and Treatment. Molecules. 2020;25. doi:10.3390/molecules 2524578933302541 PMC 7764106 · doi ↗ · pubmed ↗
4Huber CM, Yee C, May T, Dhanala A, Mitchell CS. Cognitive Decline in Preclinical Alzheimer’s Disease: Amyloid-Beta versus Tauopathy. J Alzheimer’s Dis. 2018;61: 265–281. doi:10.3233/JAD-17049029154274 PMC 5734131 · doi ↗ · pubmed ↗
5Area-Gomez E, Schon EA. Alzheimer disease. Adv Exp Med Biol. 2017;997: 149–156. doi:10.1007/978-981-10-4567-7_1128815528 · doi ↗ · pubmed ↗
6Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer’s disease. Nat Neurosci. 2020;23: 311–322. doi:10.1038/s 41593-020-0599-532112059 · doi ↗ · pubmed ↗
7Mollon J, Knowles EEM, Mathias SR, Gur R, Peralta JM, Weiner DJ, Genetic influence on cognitive development between childhood and adulthood. Mol Psychiatry. 2021;26: 656–665. doi:10.1038/s 41380-018-0277-030644433 PMC 6570578 · doi ↗ · pubmed ↗
8Lee J, Meijer E, Langa KM, Ganguli M, Varghese M, Banerjee J, Prevalence of dementia in India: National and state estimates from a nationwide study. Alzheimer’s Dement. 2023;19: 2898–2912. doi:10.1002/alz.1292836637034 PMC 10338640 · doi ↗ · pubmed ↗