Rare Variants Analyses Suggest Novel Cleft Genes in the African Population
Azeez Alade, Peter Mossey, Waheed Awotoye, Tamara Busch, Abimbola Oladayo, Emmanuel Aladenika, Mojisola Olujitan, J.J Lord Gowans, Mekonen A. Eshete, Wasiu L. Adeyemo, Erliang Zeng, Eric Otterloo, Michael O’Rorke, Adebowale Adeyemo, Jeffrey C. Murray, Justin Cotney

TL;DR
This study identifies new genes linked to cleft lip and palate in African populations by analyzing rare genetic variants.
Contribution
The study discovers novel candidate genes for non-syndromic orofacial clefts in African populations using rare variant analysis.
Findings
Thirteen genes showed suggestive associations with non-syndromic orofacial clefts.
Eight genes were consistently expressed in craniofacial tissues during facial development.
Three genes showed significant mutation constraint, indicating their potential importance.
Abstract
Non-syndromic orofacial clefts (NSOFCs) are common birth defects with a complex etiology. While over 60 common risk loci have been identified, they explain only a small proportion of the heritability for NSOFC. Rare variants have been implicated in the missing heritability. Thus, our study aimed to identify genes enriched with nonsynonymous rare coding variants associated with NSOFCs. Our sample included 814 non-syndromic cleft lip with or without palate (NSCL/P), 205 non-syndromic cleft palate only (NSCPO), and 2150 unrelated control children from Nigeria, Ghana, and Ethiopia. We conducted a gene-based analysis separately for each phenotype using three rare-variants collapsing models: (1) protein-altering (PA), (2) missense variants only (MO); and (3) loss of function variants only (LOFO). Subsequently, we utilized relevant transcriptomics data to evaluate associated gene expression…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCleft Lip and Palate Research · Prenatal Screening and Diagnostics · Folate and B Vitamins Research
INTRODUCTION
Nonsyndromic orofacial clefts (NSOFCs) constitute the largest proportion of orofacial clefts (OFCs) and are estimated to affect 1.25 in every 1000 live births worldwide^1^. Due to distinct embryological origins and epidemiological patterns, NSOFCs can be broadly classified into nonsyndromic cleft lip with or without palate (NSCL/P) and nonsyndromic cleft palate only (NCSPO)^2^. The primary treatment for NSOFCs is surgical to correct structural defects. However, restoring optimal function in affected children requires a multidisciplinary team of orthodontists, maxillofacial surgeons, prosthodontists, otolaryngologists, geneticists, and pediatricians, among others^3^. In the United States, the annual cost of hospital stays for children with OFCs was over 400 million in 2013^4^. Moreover, the team of experts required for OFC care is often unavailable in resource-limited settings, leading to significant inequalities in cleft management^5^
Developing preventive or improved therapeutic strategies for NSOFCs requires a thorough understanding of their etiology. As with many complex traits, the etiology of NSOFCs is multifactorial, with genetic factors playing a considerable role^6^. According to a recent review, over 60 (> 40 associated with NSCL/P) risk loci have been implicated mainly through common variant association studies^7^. However, all identified loci/genes are estimated to explain only a small fraction (~ 25% for NSCL/P and even less for NSCPO) of the estimated heritability^8^. Low frequency/rare variants, gene-gene interactions, and gene-environmental interaction effects may likely explain the missing heritability^9^.
The role of rare variants in complex traits is well documented^10^, and a recent study found rare variants to be responsible for a larger proportion of the missing heritability in complex traits etiology^11^. However, studies evaluating the role of rare variants for NSOFCs have been restricted largely to the resequencing of known cleft candidate genes, limiting the discovery of novel genes. Although resequencing has provided insights into the burden of rare variants in these candidate genes, the small effect sizes often estimated for complex traits, together with the low MAF of rare variants (minor allele frequency [MAF] < = 0.01) makes the conventional single variants association analysis underpowered.^12–14^. A more efficient approach to rare variant analysis is to select a fixed MAF threshold and conduct an aggregate test on all variants with a MAF below this threshold within a specified region (e.g. gene), either by assigning equal weight to all variants identified or by varying weights based on the estimated variance of each variant under the null hypothesis of no association^13^. This approach has been applied successfully in discovering genes associated with complex traits like blood pressure, myocardial infarction, and schizophrenia^15–17^.
Attempts at leveraging rare variant aggregation to identify novel genes for NSOFCs are relatively new and have been limited to samples of individuals of European ancestry^8,18,19^. Additionally, these studies included all the identified rare variants within a gene, an approach that has been shown to be less sensitive due to the inclusion of synonymous variants, which often do not contribute to disease etiology^20^. To address these limitations in previous studies, we conducted gene-based rare variant aggregation tests, including only the protein-altering rare variants using our African genome-wide association study (GWAS) data. We hypothesized that genes enriched for rare protein-altering variants associated with NSOFCs would contribute to the etiology of NSOFCs. Further, we utilized transcriptomics data to provide additional evidence for the associated genes. The African population, the ancestral origin of modern humans, harbors the greatest number of genetic variations and, thus, provides a considerable opportunity for genetic discoveries^21^.
RESULTS
Gene-Based Rare Variant Results
Our analysis included 21, 829 rare variants (21, 333 missense variants, and 70 loss of function variant) (supplementary Table 2). We identified thirteen genes with suggestive associations (E-04 for the PA, Bonferroni corrected p-value = 0.05/5784(9E-06), E-04 for the MO, Bonferroni corrected p-value = 0.05/5676 (9E-06), and 0.05 for the LOFO model, Bonferroni corrected p-value = 0.05/26(2E-03)) were identified in the African GWAS data. Among the genes showing suggestive associations in the protein-altering (PA) model, ABCB1 and TTC28 were associated with NSCL/P, while TTC12, PDZD8, FCRL4, CENPF, and SLC16A9 were associated with NSCPO. In the NSCL/P sub-group analyses into NSCLO and NSCLP, MASP2 and OR5K1 genes were associated with NSCLO, while EXPH5, CSAD, ALKBH8, and RGL4 were associated with NSCLP. The results were similar for the missense-only (MO) model, with the addition of ABCB1 identified with NSCL (Table 1). None of the genes showed association in the loss-of-function-only (LOFO) models. Furthermore, among the associated genes, three genes (ABCB1, TTC28 and PDZD8 genes) showed significant mutation constraint to missense mutations using the GnomeAD database.
Gene prioritization using transcriptomics data.
10 genes (ABCB1, TTC28, TTC12, CSAD, EXPH5, SLC16A9, MASP2, ALKBH8, CENPF and PDZD8) of the 13 genes showed expression during the formation of the human face. Most of the genes were biased towards some mesenchymal cells subtype except for the PDZD8 and ABCB1 genes (Fig. 2 and Supplementary Fig. 1). Nine of the 13 genes had mouse orthologs (ALKBH8, CENPF, CSAD, EXPH5, MASP2, PDZD8, SLC16A9, TTC12, and TTC28) and were analyzed using the SysFACE gene expression analysis tool. Seven of the Nine genes (ALKBH8, CENPF, CSAD, EXPH5, PDZD8, SLC16A9, and TTC28) showed consistently high expression and enrichment in relevant mouse craniofacial tissues – maxillary, medial and lateral eminence, and palate) (Fig. 3 and supplementary Fig. 2). Interestingly, these seven genes were also among the 10 genes that showed expression during human face development (Fig. 2).
DISCUSSION
We conducted gene-based rare variant aggregation tests to identify novel candidate genes that could explain the missing heritability for NSOFCs. In total, we identified 13 genes with suggestive associations primarily driven by rare missense variations. Seven genes (ALKBH8, CENPF, CSAD, EXPH5, PDZD8, SLC16A9, and TTC28) showed consistent expression in relevant mouse and human craniofacial tissues during the formation of the face and one gene (ABCB1) without a mouse ortholog showed expression in human craniofacial tissues. Further, three genes (ABCB1, TTC28, and PDZD8) were predicted to be intolerant to missense variations.
Using biological plausibility to prioritize loci/genes with suggestive association in GWAS has been previously shown as a valid approach to bypassing the large sample size requirement needed to achieve significant association^22^. While gene mutation constraint is a good metric for identifying pathogenic genes, it is more commonly seen with a dominant disease-causing gene and may not be informative in other disease models (e.g., recessive). Thus, we prioritized the 8 associated genes (ABCB1, ALKBH8, CENPF, CSAD, EXPH5, PDZD8, SLC16A9, and TTC28) with consistent expression in relevant craniofacial tissues during human or mouse face development. Majority of these genes (TTC28, CSAD, EXPH5, SLC16A9, ALKBH8, and CENPF) showed biased expression towards mesenchymal cells subtypes from human craniofacial tissues. This could point to the importance of mesenchymal cells in palate formation since these genes were associated with either NSCLP or NSCPO and not NSCLO. Moreover, this could also be due to the stage (CS17) of embryonic development at which the facial prominences were harvested. The CS17 stage (~ 7 weeks post fertilization), a period that coincides with the later stages of lip formation and the beginning of palate formation.
Five genes (ABCB1, TTC28, PDZD8, CENPF, and ALKBH8) have been previously implicated in NSOFCs or diseases presenting with cleft phenotypes. The ABCB1 and TTC28 were associated with NSCL/P while the PDZD8, CENPF, ALKBH8 genes are associated with NSCPO. These genes except the ABCB1 without a mouse ortholog showed consistently high expression and enrichment in mouse craniofacial tissues especially the palate. The ABCB1 gene is an ATB binding cassette gene that functions to regulate fetal exposure to xenobiotics through the placenta^23^. Single nucleotide variations in the ABCB1 gene have been reported to increase the risk of NSCL/P^23^. The TTC28 gene is in the 22q12.2 region and previous case report on patients with microdeletion of this region implicate this gene as potential candidate for pierre robin sequence- a condition that presents with cleft palate^24^. Furthermore, copy number variations overlapping this gene have been reported in cleft palate patients^25^. The PDZD8 gene assists with lipid transfer from the endoplasmic reticulum to the endosomes and lysosomes^26,27^. Burden of variations though not statistically significant have been previously reported in this gene among patients with cleft lip with or without palate^28^. CENPF is a kinetochore associated protein that colocalizes with the IFT88 (a ciliopathy gene) and compound heterozygous mutations in the CEPNF gene were reported in a human fetus with ciliopathic malformations, including cleft palate^29^. Mutations in ALKBH8 cause intellectual developmental disorder, autosomal recessive 71, MRT71 (OMIM #618504) in humans. This condition presents with craniofacial dysmorphic features, which include long lips with V-shaped upper lip, macrostomia, and retruded mandible^30^. Macrostomia and retruded mandible may cause cleft palate by impeding the elevation of palatal shelves during palate formation^31^.
We identified three potential novel genes (CSAD, EXPH5, SLC16A9) for NSOFCs. The burden of variants in CSAD and EXPH5 were associated with NSCLP while those in the SLC16A9 gene were associated with NSCPO. Although these genes lack previous reports of direct association with NSOFCs phenotypes, they have been implicated in processes crucial for craniofacial development. The EXPH5 gene, for instance, has been shown to play a role in cell-cell adhesion^32^; a critical process in craniofacial morphogenesis. The SLC16A9 gene is linked to lipid metabolic traits^33^; a functionally relevant downstream target of the non-canonical transforming growth factor beta (TGFbeta) signaling - a signaling mechanism involved in face formation. The CSAD gene functions in the biosynthesis of taurine and the role of taurine in organogenesis has been demonstrated in mice^34^.
In the current study, we replicated the ABCB1 gene association which was previously reported for NSCL/P in a common variants’ association study. While some reports have demonstrated the accumulation of rare variants in genes previously identified through common variants association analyses^35^, suggesting a convergence of common and rare variants in loci/genes associated with NSOFCs phenotypes, other reports suggest that common and rare variants may act through separate loci/genes^28^. For instance, targeted sequencing of regions surrounding genome-wide significant loci for NSOFCs showed no evidence of rare variants burden in genes/regulatory regions proximal to these loci^28^. Additionally, rare variants are population-dependent, which could explain why we did not replicate previously reported genes/loci from rare variant association studies in other populations. Furthermore, our tests of association require that we exclude any gene with only one rare variant, even if the same gene harbored common variants. This might have resulted in the omission of genes with contributions from both rare and common variants if participants in our cohort only harbor one rare variant in the gene. Therefore, future studies should consider a model that allows for the incorporation of both common and rare variants.
Our study has some limitations. First, we used array-based genotype data for discovery. This approach means some rare variants were not examined. Second, we controlled for population stratification by adding the top 10 genotype PCs as covariates in our gene-based association regression models. Adjusting for top PCs has been shown to prevent p-value inflation and reduce false positive rates in common variant analysis, but its performance for rare variant analysis remains controversial^36^. The reason is that rare variants, being newer mutations, reflect a more granular population substructure compared to common variants^37^. Finally, we restricted our analysis to only the coding (missense and Lof) and splice-altering variants. Rare variants including insertions/deletions in non-coding regulatory regions also contribute to the etiology of NSOFCs; however, defining these regions’ analytical units and their functional characterization remains a challenge. Hence, future studies should use WGS data for the gene-based analysis to capture more rare variants and leverage annotated craniofacial enhancers regions to analyze rare variants in non-coding regions.
In summary, we identified 13 genes with suggestive associations in our GWAS data. Among the 13 genes, 3 genes (ABCB1, TTC28 and PDZD8) were predicted to be intolerant to missense variations. Human and mouse transcriptomics data further supported the association of 8 genes. Of the 8 genes, five genes (ABCB1, TTC28, PDZD8, CENPF, ALKBH8) were previously associated with NSOFCs or diseases presenting with cleft phenotypes. The remaining three genes (CSAD, EXPH5, SLC16A9) are potentially novel candidate genes for NSOFCs.
METHODS
GWAS Study Participants
The details of the GWAS participants have been previously published^38^. Briefly, NSOFC case children were recruited during surgical repair at cleft clinics and free surgical missions sponsored by Smile Train in Nigeria, Ghana, and Ethiopia. The cleft surgeons at each participating site used a standardized phenotyping protocol (physical examination and clinical photographs) to confirm NSOFC status. Additionally, echocardiography was used to rule out the presence of congenital heart defects to ensure nonsyndromic status. Control children were those without a birth defect diagnosis attending immunization/welfare clinics at the same center where the case children were recruited. To be eligible to participate in the study, case and control children must have biological parents of African ancestry who reside in Africa. Our sample included 1019 NSOFC case children and 2150 unrelated control children. Among the cases, 810 had non syndromic cleft lip with or without palate (NSCL/P) – 394 non-syndromic cleft lip only (NSCLO), 420 non-syndromic cleft lip and palate (NSCLP), and 205 had non-syndromic cleft palate only (NSCPO). The distribution of the GWAS study participants by cleft status and country of origin is shown in Supplementary Table 1.
Data Collection, DNA Extraction, and Genotyping
Demographic information (age, sex, and residential location) and limited exposure information were obtained. Saliva specimens were collected using the Oragene saliva kit, de-identified and shipped to the Butali laboratory at the University of Iowa. DNA was extracted from the saliva using the standard Oragene saliva DNA extraction protocol and quantified using Qubit (http://www.invitrogen.com/site/us/en/home/brands/Product-Brand/Qubit.html;Thermo Fisher Scientific,Grand Island, NY). As part of internal QC, Taqman XY genotyping was used for sex confirmation. Subsequently, aliquoted DNA was sent to the Center for Inherited Disease Research (Baltimore, Maryland, USA) for genotyping. The Illumina Multi-Ethnic Genotyping Array MEGA v2 15070954 A2 (genome build 37), which has over 2 million variants including over 60 000 rare variants selected from populations of African origin, was used for the genotyping. Details on genotyping and QC measures have been previously published^38^.
Data Analysis
Gene-Based Analyses
Variant predication was performed using annotate variation (ANNOVAR) to identify the functional consequences (synonymous, nonsynonymous, and splice-altering) of each variant. Splice altering and non-synonymous variants (variations resulting in either amino acid change or premature termination of the protein) were filtered against the 1000 genome (1KG) population databases (http://www.1000genomes.org/) to identify rare variants (found in < = 1% of Africans included in the database). Additionally, we filtered for only the variants with a MAF < = 1% in the controls included in our sample (Supplementary table 2). Subsequently, genes with two or more rare non-synonymous variants in the data were filtered to satisfy the aggregation requirement for the proposed analyses (Fig. 1). For the analysis, both non-synonymous and splice-altering variants were included and referred to as “protein-altering”. Three gene-based collapsing models were used: (1) protein-altering (PA), (2) missense only (MO), and (3) loss of function only (LOFO). The different phenotypes (NSCL/P and NSCPO) were analyzed independently and separately. NSCL/P was further subdivided into nonsyndromic cleft lip only (NSCLO) and nonsyndromic cleft lip and palate (NSCLP). Gene-based rare variant aggregate tests were used to identify genes enriched in rare variants associated with NSOFCs. Three rare variant aggregation tests were conducted: the combined multivariate and collapsing (CMC) test, the sequence kernel association test (SKAT), and the SKAT-O test. The first two tests are complementary, operating under different assumptions^39^. The CMC test, being a burden test, assumes the same direction of effects for all the variants within a gene, whereas SKAT is a variance component test that allows for an opposing direction of effects^39^. The SKAT-O test is an omnibus test that is more robust and efficient across different scenarios^39^. In all three tests, population stratification and sex were controlled for by adding the top 18 principal components (PC) and child sex into the regression-based models. The lack of prior knowledge about the underlying biology of these variants precluded the ability to select one optimal test. Hence, the decision to conduct the three tests where the CMC and SKAT will show rigor and SKAT-O will confirm reproducibility. The Bonferroni correction was used to adjust for multiple testing and set the cutoff for statistical significance at a 5% error rate to 0.05 divided by the number of genes tested and a 10^2^ higher threshold for suggestive significance. A gene was considered significant if it showed a statistically significant or suggestive significant association in either CMC and SKATO or SKAT and SKATO. Rare variant aggregation tests were conducted on our GWAS data using the SKAT R package (https://cran.r-project.org/web/packages/SKAT/index.html) or rare variant association tests implemented under the case-control study design. Further, we used the genomeAD gene mutation constraint prediction tool (https://gnomad.broadinstitute.org/help/constraint) to identify associated genes level of intolerance to mutational changes.
Gene Expression Analyses
To provide biological insight, expression of the associated genes during organogenesis of the human and mouse faces was examined. The human craniofacial gene expression dataset was generated from single-nuclei RNA-seq of craniofacial prominences of human CS17 embryos. The CS17 stage corresponds to ~ 7 weeks post-fertilization which coincides with the later stage of lip formation and the early stage of palate formation. Additional details on the RNA seq data quality control and analysis was reported by Yankee et al. 2023^27^.
SysFACE analysis was performed as previously described^40^ using GSE7759, GSE22989, GSE31004, and GSE11400 microarray data (Affymetrix Mouse Genome 430 2.0 Array) and GSE55965 microarray data (Affymetrix Mouse Gene 1.0 ST Array). The datasets were analyzed using affy package in R. Multiple probesets representing individual genes were normalized and the probeset with highest median expression was considered representative of gene expression. WB data generated on Affymetrix Mouse Genome 430 2.0 Array platform as previously described was used for enriched expression analysis.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Mossey P. A. & Modell B. Epidemiology of oral clefts 2012: an international perspective. Front Oral Biol 16, 1–18 (2012). 10.1159/00033746422759666 · doi ↗ · pubmed ↗
- 2Calzolari E. Associated anomalies in multi-malformed infants with cleft lip and palate: An epidemiologic study of nearly 6 million births in 23 EUROCAT registries. American journal of medical genetics Part A 143, 528–537 (2007).10.1002/ajmg.a.3144717286264 · doi ↗ · pubmed ↗
- 3Banerjee M. & Dhakar A. S. Epidemiology-clinical profile of cleft lip and palate among children in India and its surgical consideration. CJS 2, 45–51 (2013).
- 4Arth A. C., Tinker S. C., Simeone R. M., Ailes E. C., Cragan J. D. & Grosse S. D. Inpatient Hospitalization Costs Associated with Birth Defects Among Persons of All Ages - United States, 2013. MMWR Morb Mortal Wkly Rep 66, 41–46 (2017). 10.15585/mmwr.mm 6602 a 128103210 PMC 5657658 · doi ↗ · pubmed ↗
- 5Nicholas D. S., Jean Calleja A., Gareth D., Felicity V. M., Peter H. & Martin P. Equality in cleft and craniofacial care. Equality in cleft and craniofacial care 7, 35 (2020). 10.20517/2347-9264.2020.99 · doi ↗
- 6Beaty T. H., Marazita M. L. & Leslie E. J. Genetic factors influencing risk to orofacial clefts: today’s challenges and tomorrow’s opportunities. F 1000 Res 5, 2800 (2016). 10.12688/f 1000 research.9503.127990279 PMC 5133690 · doi ↗ · pubmed ↗
- 7Alade A., Awotoye W. & Butali A. Genetic and epigenetic studies in non-syndromic oral clefts. Oral Dis 28, 1339–1350 (2022). 10.1111/odi.1414635122708 · doi ↗ · pubmed ↗
- 8Leslie E. J. Association studies of low-frequency coding variants in nonsyndromic cleft lip with or without cleft palate. Am J Med Genet A 173, 1531–1538 (2017). 10.1002/ajmg.a.3821028425186 PMC 5444956 · doi ↗ · pubmed ↗
