# Comparative GWAS using global and Indian Reference Panels reveals non-coding drivers of COVID-19 severity and mortality

**Authors:** Aastha Kaushik, Ramakant Mohite, Ranjeet Maurya, Bansidhar Tarai, Sandeep Budhiraja, Uzma Shamim, Rajesh Pandey, David Safronetz, Max Carlos Ramírez-Soto, David Safronetz, Max Carlos Ramírez-Soto, David Safronetz

PMC · DOI: 10.1371/journal.pntd.0014020 · PLOS Neglected Tropical Diseases · 2026-03-03

## TL;DR

This study finds that using an Indian-specific genetic reference panel reveals new genetic factors linked to severe and fatal outcomes in Indian patients with COVID-19.

## Contribution

The study demonstrates that population-specific genetic reference panels uncover unique genetic signals missed by global datasets in underrepresented populations.

## Key findings

- The IndiGen reference panel identified risk variants linked to alveolar collapse and fibrotic remodelling in severe COVID-19.
- Population-specific signals, such as rs10096505 near SFTPC/BMP1, were missed by global datasets.
- IndiGen-specific variants were associated with immune dysregulation in fatal outcomes.

## Abstract

India remains underrepresented in global genomic studies. We hypothesized that population-specific genetic variants contribute to COVID-19 severity and outcomes, and that the choice of reference panel during imputation impacts Genome-Wide Association Studies (GWAS) resolution. Integrating both global and indigenous reference panels may unravel unique and shared genetic associations that are otherwise missed during standard analyses. In this study, we aimed to perform a comparative GWAS using Indian population-specific (IndiGen) and global (1000 Genomes Project/1KGenomes) reference panels to identify potential genetic loci associated with the COVID-19 differential severity and mortality among the Indian patients. Genomic DNA was extracted and genotyped from the patients who were stratified based on the clinical data capturing COVID-19 symptoms and clinical outcomes. Quality control, liftover, phasing and imputation were performed on the genotypic data. GWAS was performed separately for the severity and mortality phenotypes. Significant loci were functionally annotated using Linkage Disequilibrium (LD) analysis, eQTL mapping, and gene annotation tools. Comparative GWAS with 1KGenomes and IndiGen panels revealed both shared and unique loci. 1KGenomes identified protective variants near MIR4432HG involved in endothelial stability, while IndiGen uncovered risk variants with rs10096505 (SFTPC/BMP1) linked to alveolar collapse and fibrotic remodelling. rs9547631 was common to both panels for mortality, whereas IndiGen-specific risk variants (rs78554880, rs112982286, rs111390553, and rs79900659) were associated with immune dysregulation. Functional annotation of these loci pointed to key biologically plausible links to COVID-19 severity and fatal outcomes. Briefly, the use of an indigenous reference panel improved variant discovery and LD resolution, highlighting that population-specific signals are missed by the generic global datasets. Our findings underscore the importance of inclusive genomic resources for accurate association mapping in the underrepresented populations.

Most genetic studies have focused on people of European descent, leaving South Asian populations, especially those from India, largely underrepresented. To help fill this gap, we studied the genetic makeup of Indian individuals with different COVID-19 severity levels and outcomes, ranging from recovery to death. We wanted to understand why some people become severely ill and while others recover, and whether genetic differences might help explain this. To study this, we analysed each person’s DNA and used two different datasets to fill in missing genetic information. First, represents global populations (1KGenomes), and secondly, IndiGen, which is specific for the Indian population. The Indian specific dataset helped us discover more genetic differences, including some that were missed by the global reference. These differences were linked to important biological processes such as lung function and immune response. For instance, we identified a variant located near the SFTPC and BMP1 genes, which are associated with impaired surfactant production and lung fibrosis. In patients who did not survive, we saw strong genetic signals associated with immune system regulation. In a nutshell, our study captures novel genetic signals with potential links to COVID-19 pathophysiology and highlights the importance of using tailored genomic resources to improve the accuracy of association findings.

## Linked entities

- **Genes:** MIR4432HG (MIR4432 host gene) [NCBI Gene 106660609], SFTPC (surfactant protein C) [NCBI Gene 6440], BMP1 (bone morphogenetic protein 1) [NCBI Gene 649]
- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Genes:** MIR4432HG (MIR4432 host gene) [NCBI Gene 106660609], DHX15 (DEAH-box helicase 15) [NCBI Gene 1665] {aka DBP1, DDX15, PRP43, PRPF43, PrPp43p, hPrp43}, H2ACP1 (H2AC histone family pseudogene 1) [NCBI Gene 100509927] {aka HIST1H2APS6}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, POLR3D (RNA polymerase III subunit D) [NCBI Gene 661] {aka BN51T, C53, RPC4, RPC53, TSBN51}, MIR4432 (microRNA 4432) [NCBI Gene 100616473], AGT (angiotensinogen) [NCBI Gene 183] {aka ANHU, SERPINA8, hFLT1}, TGFB1 (transforming growth factor beta 1) [NCBI Gene 7040] {aka CAEND1, CED, DPD1, IBDIMDE, LAP, TGF-beta1}, BMP1 (bone morphogenetic protein 1) [NCBI Gene 649] {aka OI13, PCOLC, PCP, TLD}, RFX3 (regulatory factor X3) [NCBI Gene 5991], NUDT18 (nudix hydrolase 18) [NCBI Gene 79873] {aka MTH3}, FGFBP1 (fibroblast growth factor binding protein 1) [NCBI Gene 9982] {aka FGF-BP, FGF-BP1, FGFBP, FGFBP-1, HBP17}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, UBXN10 (UBX domain protein 10) [NCBI Gene 127733] {aka UBXD3}, LGR5 (leucine rich repeat containing G protein-coupled receptor 5) [NCBI Gene 8549] {aka FEX, GPR49, GPR67, GRP49, HG38}, PHYHIP (phytanoyl-CoA 2-hydroxylase interacting protein) [NCBI Gene 9796] {aka DYRK1AP3, PAHX-AP, PAHXAP1}, EGLN1 (egl-9 family hypoxia inducible factor 1) [NCBI Gene 54583] {aka C1orf12, ECYT3, HALAH, HIF-PH2, HIFPH2, HPH-2}, LRRC74A (leucine rich repeat containing 74A) [NCBI Gene 145497] {aka C14orf166B, LRRC74}, LZTFL1 (leucine zipper transcription factor like 1) [NCBI Gene 54585] {aka BBS17}, GOT2P2 (GOT2 pseudogene 2) [NCBI Gene 391139] {aka GOT2L2}, IL10 (interleukin 10) [NCBI Gene 3586] {aka CSIF, GVHDS, IL-10, IL10A, TGIF}, ABO (ABO, alpha 1-3-N-acetylgalactosaminyltransferase and alpha 1-3-galactosyltransferase) [NCBI Gene 28] {aka A3GALNT, A3GALT1, GTA, GTB, NAGAT}, SERTM1 (serine rich and transmembrane domain containing 1) [NCBI Gene 400120] {aka C13orf36}, SFTPC (surfactant protein C) [NCBI Gene 6440] {aka BRICD6, PSP-C, SFTP2, SMDP2, SP-C}, LINC01115 (long intergenic non-protein coding RNA 1115) [NCBI Gene 339822], TNFSF4 (TNF superfamily member 4) [NCBI Gene 7292] {aka CD134L, CD252, GP34, OX-40L, OX4OL, TNLG2B}, PPARGC1A (PPARG coactivator 1 alpha) [NCBI Gene 10891] {aka LEM6, PGC-1(alpha), PGC-1alpha, PGC-1v, PGC1, PGC1A}, TNFSF18 (TNF superfamily member 18) [NCBI Gene 8995] {aka AITRL, GITRL, TL6, TNLG2A, hGITRL}, TMPRSS2 (transmembrane serine protease 2) [NCBI Gene 7113] {aka PRSS10}, RPL36 (ribosomal protein L36) [NCBI Gene 25873] {aka L36, eL36}, REEP4 (receptor accessory protein 4) [NCBI Gene 80346] {aka C8orf20, PP432, Yip2c}
- **Diseases:** hypertension (MESH:D006973), sore throat (MESH:D010612), death (MESH:D003643), production (MESH:D007787), alveolar collapse (MESH:D001261), rheumatoid arthritis (MESH:D001172), microvascular thrombosis (MESH:D017566), epithelial injury (MESH:D009375), endothelial injury (MESH:D057772), EAS (MESH:D000073605), cardiovascular diseases (MESH:D002318), infected (MESH:D007239), COVID (MESH:D000086382), immune dysfunction (MESH:D007154), splenic (MESH:D013158), fibrotic remodelling (MESH:D020257), pulmonary fibrosis (MESH:D011658), hyper (MESH:D007589), endothelial (MESH:D005642), infectious diseases (MESH:D003141), immune dysregulation (OMIM:614878), surfactant dysfunction (MESH:C580477), cardiovascular and respiratory diseases (MESH:D012140), fibrotic diseases (MESH:D004194), inflammation (MESH:D007249), fibrosis (MESH:D005355), lung injury (MESH:D055370), shortness of breath (MESH:D004417), lung (MESH:D008171), endothelial dysfunction (MESH:D014652), diabetes (MESH:D003920), cancer (MESH:D009369), Neglected Tropical Diseases (MESH:D058069), fatigue (MESH:D005221), respiratory failure (MESH:D012131), alveolar dysfunction (MESH:D011649), impaired pulmonary function (OMIM:608852), multi-organ dysfunction (MESH:D009102), obesity (MESH:D009765), systemic sclerosis (MESH:D012595), hypoxia (MESH:D000860), respiratory distress (MESH:D012128), fever (MESH:D005334)
- **Chemicals:** Ser (MESH:D012694), reactive oxygen species (MESH:D017382), oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]
- **Mutations:** rs35998257, rs9531973, rs10096505, rs1850535, rs2070788, rs35575084, rs34607367, rs8192340, rs112982286, rs1800872, rs9547631, rs6545803, rs17024964, rs78554880, rs35196779, rs1800896, rs479200, rs8192330, rs79900659, rs6545801, rs111390553

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12956133/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12956133/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12956133/full.md

---
Source: https://tomesphere.com/paper/PMC12956133