CAMK1D as a potential therapeutic target for gut microbiota-driven promotion of lung adenocarcinoma development
Nuo Yan, Yang Zhang, Silin Wang, Sheng Hu, Liancheng Ruan, Yunzhe Wang, Weiqiang Feng, Wenxun Xiong, Wenxiong Zhang, Yiping Wei, Chuan Yao

TL;DR
This study explores how gut bacteria may influence lung cancer development and identifies CAMK1D as a potential new target for treatment.
Contribution
The study identifies CAMK1D as a novel microbiota-related therapeutic target for lung adenocarcinoma through genetic and transcriptomic analyses.
Findings
Prevotella9 and Parabacteroides are causally linked to lung adenocarcinoma.
CAMK1D is upregulated in lung adenocarcinoma cell lines and is proposed as a potential therapeutic target.
Fifteen genes, including CAMK1D, were identified as potential genetic targets through MR and GWAS analyses.
Abstract
The gut microbiome is closely associated with malignant tumors; however the specific mechanisms by which it contributes to the development of lung adenocarcinoma remain unclear. In this study, we performed a two-sample bidirectional Mendelian randomization (MR) analysis to assess the causal relationship between the gut microbiome and lung adenocarcinoma. By identifying single nucleotide polymorphism markers linked to gut microbiome species, we aimed to discover potential biomarkers for lung adenocarcinoma. These findings may offer new insights into the role of the gut microbiome in the prevention and treatment of lung adenocarcinoma. We used genome-wide association study (GWAS) summary statistics to assess the association between the gut microbiome and lung adenocarcinoma through two-sample MR analysis. Sensitivity analyses were performed to confirm the robustness of the findings.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —National Natural Science Foundation of China
- —Science and Technology Plan of Jiangxi Provincial Health Commission
- —Key Research and Development Program of Jiangxi Province
- —Jiangxi Province Graduate Innovation Fund Project
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGut microbiota and health · Cystic Fibrosis Research Advances · Ferroptosis and cancer prognosis
Introduction
According to the latest global cancer map, approximately 2.5 million new lung cancer cases and 1.8 million lung cancer deaths occur annually, making lung cancer the most common cancer worldwide and a leading cause of cancer-related mortality (Siegel et al., 2025). Notably, lung adenocarcinoma has been steadily increasing in proportion to overall lung cancer diagnoses (Zhang et al., 2023b). Several known risk factors contribute to the development of lung adenocarcinoma, including smoking, chronic lung diseases, air pollution, occupational exposures, and genetic factors (Barta, Powell & Wisnivesky, 2019; Chen et al., 2022). However, the specific pathogenic mechanisms remain unclear.
The gut microbiome refers to the community of bacteria residing in the human gastrointestinal tract. This diverse microbial population maintains normal physiological and immune functions in the host intestine (Dickson, Erb-Downward & Huffnagle, 2013; Kuziel & Rakoff-Nahoum, 2022). The gut microbiome may influence lung cancer development and progression. Microbial dysbiosis may lead to metabolic changes, immune suppression, and recruitment of pro-inflammatory factors, thereby promoting lung cancer (Georgiou et al., 2021). Recent theories suggest that because of their shared embryonic origin and similar structures, the respiratory and gastrointestinal tracts have established a distinct interplay known as the gut-lung axis (GLA) (Ge et al., 2021). The gut microbiome produces specific metabolites, including short-chain fatty acids (SCFAs), tryptophan derivatives, polyamines, and secondary bile acids, which interact through the GLA to modulate the immune status of the tumor microenvironment (TME) and host response to immune checkpoint inhibitors (ICIs) (Hagihara et al., 2024; Li et al., 2024). Therefore, the gut microbiome may play a crucial role in precise lung cancer treatment and serve as a biomarker for diagnostic and therapeutic purposes. However, the current research on the relationship between lung adenocarcinoma and the gut microbiome has some limitations. Owing to methodological limitations, the causal relationship between the gut microbiome and lung adenocarcinoma, as well as its potential applications in prevention and treatment, have not been extensively explored.
Mendelian randomization (MR) is a statistical method designed to infer potential causal relationships from observed associations, following the principles of Mendelian genetics and using genetic variations as instrumental variables (IVs) to estimate the causal effect of exposure on an outcome. MR analysis has been widely applied to study the associations between genetic polymorphisms and diseases, including links between Alzheimer’s disease and epilepsy (Fang et al., 2023), autoimmune disorders (Yuan et al., 2023), and various types of cancer (Xiao et al., 2023). Unlike in traditional observational studies, we used single nucleotide polymorphisms (SNPs) as IVs and applied two-sample MR to infer a potential causal relationship between the gut microbiome and lung adenocarcinoma (Bowden & Holmes, 2019; Weith & Beyer, 2023). This approach was undertaken to elucidate the role of the gut microbial community in the development and progression of lung adenocarcinoma. Furthermore, this study integrated single-cell transcriptomics, bulk RNA sequencing, bioinformatics analysis, and real-time quantitative PCR to comprehensively reveal the potential mechanisms by which the gut microbiome contributes to the development of lung adenocarcinoma. Single-cell transcriptomics provides high-resolution information on cell types and functional characteristics, aiding in a better understanding of the interactions between molecular variations in the gut microbiome and lung adenocarcinoma (De Zuani et al., 2024). Bulk RNA sequencing provides global gene expression data to validate and complement the findings from single-cell transcriptomics. Concurrently, bioinformatics analysis were performed to elucidate the mechanisms of key biomarkers, which could then preliminarily validated through quantitative Real-Time PCR (qRT-PCR) (Zhang et al., 2023a).
This study aimed to integrate genome-wide association study (GWAS) data within a bidirectional two-sample MR framework. By analyzing gene expression omnibus (GEO) single-cell and transcriptomic data, we validated our findings in cellular models to elucidate the relationship between the gut microbiome and lung adenocarcinoma. Additionally, we explored potential biological targets for gut microbiome-targeted therapies for lung adenocarcinoma, provided new insights into its pathogenesis, and established a theoretical foundation for diagnosis and treatment based on the gut microbiome.
Methods
Study design
This study adhered to the STROBE-MR guidelines (Skrivankova et al., 2021) and upheld the core principles of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) initiative (Von Elm et al., 2007). The MR approach was based on three assumptions (Bowden, Davey Smith & Burgess, 2015). (1) The genetic variants used as IVs are associated with specific gut microbial communities. (2) These genetic variants are not associated with the unmeasured confounders related to lung adenocarcinoma. (3) These genetic variants influence lung adenocarcinoma solely through their effects on gut microbial communities and not via other pathways. Our analysis used publicly available GWAS summary statistics. The workflow of this study is illustrated in Fig. 1. Finally, reverse MR analysis was conducted to mitigate the potential effects of lung adenocarcinoma on the gut microbiome.
Overall study design and analytical workflow.GWAS-based bidirectional Mendelian randomization was used to evaluate potential causal links between gut microbiota and lung adenocarcinoma, followed by integrative bioinformatics analyses of bulk and single-cell transcriptomic data and RT-qPCR validation to identify key microbiota-related genes.
Data sources for gut microbiota and lung adenocarcinoma
The genetic data on the gut microbiome used in this study were sourced from the latest GWAS summary data by the MiBio-Gen Consortium (https://mibiogen.gcc.rug.nl/). These data include contributions from 18,340 participants across 24 cohorts from the United States, Canada, Israel, South Korea, Germany, Denmark, the Netherlands, Belgium, Sweden, Finland, and the United Kingdom. The analysis of microbial quantitative trait loci (mbQTLs) in this study generated 211 genome-wide associations related to the microbiota, covering nine phyla, 16 classes, 20 orders, 35 families, and 131 genera (Kurilshikov et al., 2021). We obtained outcome data from a large population-based cohort study of European adults conducted by the MRC Integrative Epidemiology Unit (https://gwas.mrcieu.ac.uk/) (Skrivankova et al., 2021). The total sample size was 65,864, including 11,245 lung adenocarcinoma cases, 54,619 controls, and 10,345,176 SNPs. Additional ethical approval was not required for this study as the original research had already obtained the necessary approval from the relevant ethics committees and institutional review boards.
To assess the potential causal impact of the gut microbiome on lung adenocarcinoma, we selected instrumental variables (IVs) based on the following criteria. SNPs were filtered as IVs using a p-value threshold of less than 1.0 ×10^−5^ in the MR analysis (Sanna et al., 2019). Linkage disequilibrium between SNPs was calculated using the 1000 Genomes European reference panel, retaining SNPs with an r^2^ threshold of less than 0.001 to maximize the genetic variance explained by the genetic predictors. The clumping process within the TwoSampleMR package (version 0.6.3) in the R software (version 4.3.2) was used to select independent SNPs. The effect allele frequency and F-statistic values were calculated by filtering out weak IVs with F-statistic values of <10. IVs with F-statistics >10 were considered more effective, ensuring that the selected instruments had sufficient power to support causal inferences (Verbanck et al., 2018). This enhances the reliability and accuracy of the MR analysis.
Mendelian randomization statistical analysis
This study primarily employed the inverse variance weighted (IVW) method, while also incorporating MR-Egger, weighted median, simple mode, and weighted mode approaches in MR analysis to calculate causal estimates between the gut microbiome and lung adenocarcinoma (Bowden et al., 2016; Burgess, Small & Thompson, 2017; Burgess & Thompson, 2017; Hartwig, Davey Smith & Bowden, 2017; Sanderson, Spiller & Bowden, 2021). The IVW method utilizes a weighted linear regression with weights based on the inverse of the variance of genetic associations with the outcome, ensuring precision in the estimates. The MR-Egger regression extends this approach to address pleiotropy by adjusting for genetic variants that influence outcomes through irrelevant pathways. The weighted median estimator provides a reliable causal estimate that is valid even if up to 50% of the instruments are invalid. The simple mode method estimates the causal effects by identifying the most common estimate across instruments, thereby providing resistance to outliers. The weighted-mode method further enhances robustness by assigning more weight to precise estimates in the mode-based analysis.
We conducted a series of sensitivity analyses to explore potential heterogeneity and pleiotropy in the key estimates. Initially, we used the global outlier detection test (MR-PRESSO) to detect heterogeneity (Verbanck et al., 2018). Subsequently, we applied the intercept test from the MR-Egger regression to assess horizontal pleiotropy. A non-significant p-value (P > 0.05) was interpreted as indicating no significant heterogeneity or pleiotropy. Additionally, we performed a leave-one-out analysis to evaluate whether individual SNPs could bias or drive causal estimates (Guan et al., 2022). This was performed by sequentially removing each instrumental SNP and repeating the IVW analysis.
All statistical analyses were performed using R software (version 4.3.2). We used the gwasglue package (version 0.0.0.9) to process the GWAS data and applied the TwoSampleMR package (version 0.6.3) to implement methods, including IVW averaging, MR-Egger regression, weighted median, simple mode, and weighted mode (Hemani et al., 2018). We applied the mr_heterogeneity and mr_pleiotropy_test functions to test for heterogeneity and pleiotropy in the datasets (Bowden & Holmes, 2019). The MR-PRESSO test was performed using the MRPRESSO package (version 1.0). The significance threshold was set at a p-value of less than 0.05.
Mapping SNPs to Genes
We used the online database SNPnexus (https://biit.cs.ut.ee/gprofiler/snpense) to map each queried variant to its nearest gene. This database offers a comprehensive genomic variant annotation tool that identifies and maps genetic variants to related genes and their functional annotations regardless of whether the gene is overlapping, downstream, or upstream (Oscanoa et al., 2020).
Gene Ontology and KEGG enrichment analyses of key genes
In our MR analysis, an odds ratio (OR) greater than 1 was considered indicative of harmful gut microbiota affecting lung adenocarcinoma, whereas an OR less than 1 was interpreted as beneficial gut microbiota. We conducted a functional enrichment analysis for the genetic factors mapped to the genes associated with each group of gut microbiotas. We used the clusterProfiler package (version 4.10.1) in R software to perform enrichment analysis for GO biological processes (BP), cellular components (CC), molecular functions (MF) and KEGG pathways. We then visualized the top 10 most significant Gene Ontology (GO) terms and pathways using the ggplot2 package in the R software. Results with a p-value less than 0.05 were considered significantly enriched within key gene regulatory networks.
Single-cell RNA sequencing data analysis
We obtained single-cell RNA sequencing (scRNA-seq) data (GSE131907) of lung adenocarcinoma patients from the GEO database of the NCBI (https://www.ncbi.nlm.nih.gov/geo/), which includes 58 samples from 44 lung adenocarcinoma patients: 11 tumor tissue samples, 11 normal lung tissue samples, 10 normal lymph node tissue samples, 10 metastatic brain tissue samples (from untreated lung adenocarcinoma patients undergoing conservative surgery), seven metastatic lymph node samples, four lung tumor tissue samples from advanced lung adenocarcinoma patients, and five pleural effusion samples from lung adenocarcinoma patients with malignant pleural effusion—totaling 208,506 single-cell sequencing data points (Kim et al., 2020). The publicly available dataset used in this study was approved by the ethics committee.
We performed a detailed analysis of the scRNA-seq data using the Seurat package in R (Hao et al., 2021). Initially, we filtered genes expressed in fewer than three cells to reduce noise in the analysis. Additionally, we excluded cells that detected fewer than 50 genes and those with mitochondrial gene content exceeding 5%, thereby removing potentially low-quality or damaged cells and establishing a foundation for subsequent analyses. Next, we utilized the NormalizeData function to normalize the RNA data and convert them into Seurat objects. We identified 1,500 highly variable genes using the FindVariableFeatures function and extracted the top 10 genes for visualization. We then performed principal component analysis on these highly variable genes using the RunPCA function, selecting the top 20 principal components (PCs). We conducted cell clustering analysis through the FindNeighbors and FindClusters functions, followed by uniform manifold approximation and projection (UMAP) via the RunUMAP function, clustering cells based on UMAP-1 and UMAP-2. To annotate cell types, we used the SingleR package in R, with the Human Primary Cell Atlas as a reference dataset to identify cell populations (Aran et al., 2019). To compare gene expression levels across cell types, we used the FindMarkers function in Seurat, which applies a Wilcoxon rank-sum test by default. Genes with an adjusted p-value (Benjamini–Hochberg correction) <0.05 were considered significantly differentially expressed between cell populations. Finally, we visualized the expression patterns of the aforementioned genes across different cell types using UMAP and bubble plots.
Bulk RNA data analysis
We downloaded the GSE229705 dataset from the GEO database, which contains the RNA transcriptome expression profiles of human lung adenocarcinoma samples. The dataset includes 123 lung adenocarcinoma tissue samples and 123 matched normal lung tissue samples (Dolgalev et al., 2023). Data were generated using a GPL24676 Illumina NovaSeq 6000 (Homo sapiens) platform. After standardizing, annotating, and cleaning the clinical information of the GSE229705 dataset, we used the limma package in R to identify differentially expressed genes (DEGs) between lung adenocarcinoma and normal lung tissue samples. We extracted the top 500 genes for visualization analysis, setting the DEGs threshold at a p-value < 0.01 and an absolute log fold change (|logFC|) > 0.5. We utilized the VennDiagram package (version 1.7.3) in R software to intersect the obtained DEGs with the exposure genes for further analysis.
Cell culture and reagents
The resources for the cell lines are listed in the Key Resources Table, and all cell lines were confirmed to be free of mycoplasma contamination. A549, PC9, NCI-H1299, and NCI-H1650 lung cancer cell lines were obtained from the Cell Bank/Stem Cell Bank of the Chinese Academy of Sciences. The HBE135-E6E7 human bronchial epithelial cell line and NCI-H1975 human lung adenocarcinoma cell line were obtained from ZQXZbio (Shanghai Zhongqiao Xinzhou Biotechnology Co., Ltd.). HBE135-E6E7 cells were cultured in a basal medium containing 10% cytokines (ZQ-1322 ZQXZbio). H1975, H1299, and H1650 cells were cultured in RPMI-1640 medium (CGM112.05 Cellmax) supplemented with 10% fetal bovine serum (A5669701 Gibco). A549 and PC9 cells were cultured in DMEM medium (CGM102.05 Cellmax) with 10% fetal bovine serum (A5669701 Gibco, Waltham, MA, USA). All cells were cultured under standard conditions at 37 °C with 5% CO2, 95% humidity, and 21% oxygen concentration.
Validation of hub genes through qRT-PCR
Total RNA was extracted from all cells using the TRIzol reagent (DP424 Tiangen Biotech, Beijing, China), and RNA quality and yield were assessed to ensure suitability for mRNA expression analysis. Total RNA was reverse-transcribed into cDNA using HiScript II Q Select RT SuperMix (R233−01 Vazyme, Nanjing, China). Gene expression levels were quantified utilizing the Power SYBR Green PCR Master Mix (HY-K0523 MedChemExpress, Monmouth Junction, NJ, USA) on a LightCycler^®^ 480 II Real-Time PCR System (Roche, Basel, Switzerland). Relative mRNA expression levels were analyzed using the 2^−^ΔΔCT method (Sui et al., 2021), with GAPDH as the internal control. Differences among groups were assessed using one-way ANOVA followed by post hoc tests, with p < 0.05 considered statistically significant. Data are presented as mean ± SD from three independent experiments. All primers were designed and synthesized by Sangon Biotech (Shanghai, China). Detailed primer information is provided in Table 1. Reverse transcription quantitative PCR (RT-qPCR) was performed in accordance with the MIQE guidelines (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) (Bustin et al., 2025), with full experimental details provided in the Materials and Methods section and the MIQE checklist.
Table 1: Detailed primer information.
Protein–protein interaction network and analysis of wikipathways enrichment
We established and visualized a protein–protein interaction (PPI) network of key DEGs using the STRING database (https://string-db.org/) and Cytoscape software (version 3.9.0), as described in our previous study. We utilized the NetworkAnalyst platform (https://www.networkanalyst.ca/) to predict transcription factors and construct their interaction networks. Additionally, we leveraged the WikiPathways plugin to explore and uncover relevant pathway network diagrams (Agrawal et al., 2024).
Mendelian randomization estimates for the association of gut microbial taxa with lung adenocarcinoma.(A) Scatter plots of SNP-exposure versus SNP-outcome effects for Prevotella9 (left) and Parabacteroides (right), with fitted lines from inverse variance weighted (IVW), MR-Egger, weighted median, simple mode, and weighted mode methods. (B) Funnel plots showing the distribution and symmetry of SNP-specific causal estimates for each taxon under the IVW and MR-Egger approaches. (C) Forest plots displaying individual SNP causal estimates with corresponding overall IVW and MR-Egger estimates. (D) Leave-one-out analyses showing the IVW and MR-Egger estimates obtained after sequentially excluding each SNP, indicating that no single variant disproportionately influenced the observed associations.
Results
Mendelian randomization results between Prevotella9, Parabacteroides and lung adenocarcinoma
From a pool of 7,089 gut microbiota, we ultimately selected 119 gut microbial taxa, with the number of genetic variants ranging from 3 to 24 SNPs (Tables S1, S2). All F-statistics were greater than 10, indicating no evidence of weak instrument bias. Based on the IVW MR analysis and the MR-PRESSO global test for pleiotropy and heterogeneity, we identified two gut microbial taxa with a causal impact on lung adenocarcinoma as the primary outcome (Fig. 2, Tables S3, S4). There was no evidence of pleiotropy among the genetic IVs of the microbial taxa (p > 0.05). We found that the Prevotella9 taxon (nsnp = 9; OR = 1.303; 95% confidence interval (CI) [1.072–1.585]; IVW method, p = 0.008) and the Parabacteroides taxon (nsnp = 17; OR = 1.181; 95% confidence interval (CI) [1.029–1.356]; IVW method, p = 0.018) had a positive causal effect on lung adenocarcinoma outcomes (Fig. 3, Table S5). None of the MR-Egger intercepts were significantly different from zero (all intercept p-values > 0.05), indicating no evidence of horizontal pleiotropy. Additionally, leave-one-out analysis showed no significant differences in the causal estimates for lung adenocarcinoma and other outcomes across both the gut microbiome types, suggesting that the identified causal associations were not driven by a single instrumental variable. In reverse MR analysis, no evidence of a causal relationship was found between these diseases and the gut microbiome.
Mendelian randomization estimates for the association between gut microbiota and lung adenocarcinoma risk.Odds ratios (ORs) with 95% confidence intervals are shown for the effect of genetically predicted abundance of Parabacteroides (9 SNPs) and Prevotella9 (17 SNPs) on lung adenocarcinoma, estimated using five Mendelian randomization methods (inverse variance–weighted [IVW], MR-Egger, weighted median, simple mode, and weighted mode). The corresponding p values for each method are reported in the table.
Enrichment of disease-related key genes in GO and KEGG pathways
We mapped the SNP results of the two gut microbiota types and identified 15 related genes: DNAH1, PDE10A, DOCK2, INSYN2B, DNAI3, SUOX, LINC01505, SULT4A1, NT5ELP, LINC02895, calcium/calmodulin dependent protein kinase 1D (CAMK1D), ENSG00000253557, BCAS3, C18orf63, and MYO18B (Table S6). GO analysis revealed that the gut microbiota associated with adverse disease outcomes were primarily enriched in BP such as inner dynein arm assembly, axonemal dynein complex assembly, positive regulation of phagocytosis, axoneme assembly, and regulation of phagocytosis. In terms of CC, they were mainly enriched in the axonemal dynein complex, dynein complex, microtubule-associated complex, axoneme, ciliary plasm, plasma membrane-bound cell projection cytoplasm, and cytoplasmic region. In terms of MF, they primarily included cytoskeletal motor activity, oxidoreductase activity (acting on a sulfur group of donors, with oxygen as an acceptor), T-cell receptor binding, Arp2/3 complex binding, and cGMP binding (Figs. 4A–4B). No significant pathways were identified in the KEGG pathway enrichment analysis.
Enrichment analysis of Gene Ontology (GO) terms for potential disease-related genes associated with gut microbiota.(A) Bar plot showing the enrichment of GO biological processes (BP), cellular components (CC), and molecular functions (MF) for key genes identified from the gut microbiome. The most enriched terms include inner dynein arm assembly, axoneme assembly, and regulation of phagocytosis.
Single-cell RNA-seq analysis of potential gut microbiota–related key genes in lung adenocarcinoma.(A) t-distributed stochastic neighbor embedding (t-SNE) plots of 208,506 cells from 58 samples of 44 patients in the GSE131907 dataset, showing major cell populations in normal lung (left) and lung adenocarcinoma (LUAD) tissue (right), including T cells, B cells, NK cells, monocytes, macrophages, fibroblasts, epithelial cells, endothelial cells, smooth muscle cells, and pre-B CD3-cells. (B) t-SNE feature plots depicting the expression of DNAH1, PDE10A, DOCK2, SUOX, CAMK1D, and BCAS3 in normal (left) and LUAD (right) samples. (C) Dot plots summarizing the average expression (color scale) and proportion of expressing cells (dot size) of the six candidate genes across each cell type in normal and LUAD tissues. (D) Violin plots showing the distribution of expression levels of DNAH1, PDE10A, DOCK2, SUOX, CAMK1D, and BCAS3 in different cell populations from normal and LUAD samples. Differential expression between cell types was assessed using the Wilcoxon rank-sum test with Benjamini–Hochberg correction; genes with adjusted p < 0.05 were considered significantly differentially expressed.
Single-cell sequencing analysis revealed differential expression of key genes across cellular subpopulations
We performed scRNA-seq analysis of 58 lung adenocarcinoma samples. After rigorous quality control and data preprocessing, we utilized UMAP dimensionality reduction to visualize the high-dimensional scRNA-seq data based on the top four PCs (Fig. S1). We successfully classified the cells into 10 distinct subpopulations and annotated them using the SingleR R package, identifying key cell types including NK cells, T cells, B cells, macrophages, monocytes, fibroblasts, epithelial cells, endothelial cells, smooth muscle cells, and pre-B CD34- cells. We found that T cells, monocytes, B cells, and pre-B CD34- cells were significantly enriched in lung adenocarcinoma patients compared to healthy controls, whereas NK cells, fibroblasts, and macrophages were enriched in healthy individuals (Fig. 5A). In lung adenocarcinoma samples, PDE10A showed significantly higher expression in endothelial cells but significantly lower expression in macrophages (adjusted p < 0.05, Wilcoxon rank-sum test). DOCK2 showed significantly higher expression in monocytes, endothelial cells, and pre-B CD34^−^ cells, but significantly lower expression in NK cells, T cells, B cells, and epithelial cells (adjusted p < 0.05, Wilcoxon rank-sum test). SUOX showed significantly higher expression in epithelial cells (adjusted p < 0.05, Wilcoxon rank-sum test). CAMK1D showed significantly higher expression in monocytes, epithelial cells, endothelial cells, and pre-B CD34^−^ cells, but significantly lower expression in T and B cells (adjusted p < 0.05, Wilcoxon rank-sum test). Finally, BCAS3 showed significantly higher expression in smooth muscle cells (adjusted p < 0.05, Wilcoxon rank-sum test) (Figs. 5B–5D). Furthermore, we performed Monocle analysis and generated pseudotime trajectory plots illustrating the differentiation and evolution of T cells, monocytes, B cells, epithelial cells, smooth muscle cells, NK cells, macrophages, endothelial cells, and pre-B CD34- cells along branching developmental trajectories (Fig. S1).
Bulk RNA analysis and qRT-PCR validated DEGs and predicted associated pathways
Using the GSE229705 dataset, we performed a differential gene expression analysis between healthy individuals and lung adenocarcinoma patients using the limma tool. In the GSE229705 dataset, we identified 10,461 DEGs, of which 5,153 were upregulated and 5,308 were downregulated (Table S7). Correlation heat maps and volcano plots were generated to identify significant associations. To prioritize genes that are both genetically linked to the gut microbiome and transcriptionally dysregulated in LUAD, we intersected the list of 15 microbiota-associated genes with the DEGs. Five overlapping genes (CAMK1D, BCAS3, DNAH1, PDE10A, and C18orf63) were selected for further analysis as microbiota-related LUAD candidate genes (Figs. 6A–6C). CAMK1D was significantly upregulated in lung adenocarcinoma patients compared to that in healthy controls, whereas BCAS3, DNAH1, PDE10A, and C18orf63 were significantly downregulated (Fig. 6D). qRT-PCR analysis further demonstrated that, compared with normal bronchial epithelial cells (HBE135-E6E7), CAMK1D expression was significantly higher in A549, PC-9, NCI-H1650, and NCI-H1975 lung cancer cells (p < 0.05), whereas BCAS3 expression was significantly lower in A549, NCI-H1299, and NCI-H1650 cells (p < 0.05); DNAH1 expression was significantly lower in A549, NCI-H1299, NCI-H1650, and NCI-H1975 cells (p < 0.05); PDE10A expression was significantly lower in A549, PC-9, NCI-H1299, NCI-H1650, and NCI-H1975 cells (p < 0.05); and C18orf63 expression was significantly lower in PC-9, NCI-H1299, and NCI-H1975 cells (p < 0.05) (Fig. 6E). The expression profiles obtained by qRT–PCR were broadly consistent with the patterns observed in the bulk RNA-seq data, providing additional experimental support for the robustness of our candidate gene selection. Considering the multiple gene and pathway interactions involved in CAMK1D-mediated regulation of lung adenocarcinoma, we used the STRING database to construct a PPI network for CAMK1D. We identified 10 core genes: CAMK1D, CALM3, CALML3, CALML4, CALML5, CALML6, CREB1, NOS3, CDC123 and DCX (Fig. 6F; Table S8). Furthermore, using NetworkAnalyst, we predicted 42 related transcription factors, suggesting multifaceted mechanisms by which core genes act in lung adenocarcinoma (Fig. 6G). Notably, WikiPathways enrichment analysis revealed that the 10 core genes were primarily enriched in the renin-angiotensin-aldosterone system (RAAS) signal transduction pathway (Fig. 6H).
Identification and validation of microbiota-related LUAD hub genes and CAMK1D-centered regulatory network.(A) Venn diagram showing the overlap between differentially expressed genes (DEGs) in GSE229705 and microbiota-related genes (MRGs) derived from Mendelian randomization—mapped SNPs; five overlapping genes (CAMK1D, BCAS3, DNAH1, PDE10A and C18orf63) were defined as candidate microbiota-related LUAD genes. (B) Volcano plot of DEGs in GSE229705 (123 lung adenocarcinoma tissues vs 123 matched normal lung tissues), highlighting the five candidate genes. (C) Heatmap of the top 500 DEGs between tumor and normal lung tissues in GSE229705, with samples annotated by tissue type. (D) Boxplots showing the expression levels of CAMK1D, BCAS3, DNAH1, PDE10A and C18orf63 in lung adenocarcinoma versus normal lung tissues in GSE229705; p values were calculated using unpaired two-sided t-tests. (E) Validation of CAMK1D, BCAS3, DNAH1, PDE10A and C18orf63 mRNA expression in normal bronchial epithelial cells (HBE135-E6E7) and lung adenocarcinoma cell lines (A549, PC-9, NCI-H1299, NCI-H1650 and NCI-H1975) by RT-qPCR (n = 3 independent experiments per group; data are presented as mean ± SD); statistical significance was assessed using one-way ANOVA followed by post hoc tests. ns, not significant; * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001. (F) Protein–protein interaction (PPI) network of CAMK1D and its interacting partners constructed using the STRING database. (G) Transcription factor (TF)—gene regulatory network for CAMK1D and its PPI partners predicted using NetworkAnalyst. (H) WikiPathways enrichment map of the renin–angiotensin–aldosterone system (RAAS) signaling pathway showing the localization of CAMK1D and related genes within the pathway.
Discussion
Lung adenocarcinoma is a growing global concern owing to its rising incidence rate, diverse risk factors, and severe impact on prognosis. However, our understanding of the composition and function of the gut microbiome and the potential mechanisms linking different microbial communities to lung adenocarcinoma remains limited. The lack of large-scale multi-center clinical research data contributes to an incomplete and imprecise understanding of the relationship between the gut microbiome and lung adenocarcinoma in different patient populations. Therefore, there is an urgent need to identify therapeutic targets related to the role of the gut microbiome in lung adenocarcinoma to provide meaningful diagnostic and treatment options for clinical practice. Our integrated MR and multi-omics approach addresses this gap by identifying Prevotella9 and Parabacteroides as causal risk factors and nominating CAMK1D as a potential therapeutic target.
In this study, we utilized summary statistics from the largest GWAS meta-analysis of the gut microbiome and lung adenocarcinoma conducted by the MiBioGen consortium. This bidirectional framework enhances causal inference by accounting for reverse effects and pleiotropy, revealing robust associations absent in observational studies. Furthermore, bidirectional MR systematically considers potential biases and confounding factors that are crucial for establishing causal relationships and identifying the mechanisms by which gut microbiome changes lead to lung adenocarcinoma (Bouras et al., 2022; Li et al., 2023; Long et al., 2023). The advantages of these methods further strengthen the credibility and scientific rigour of our findings, highlighting the innovation and depth of our study. By effectively controlling potential confounding factors and considering shared genetic factors, we ensured a reliable causal inference.
The GLA is a major pathway through which the gut microbiome interacts with the lungs (Major et al., 2023). For example, the gut microbial community can influence the immune response by enhancing the effects of innate immune cells (e.g., dendritic cells, macrophages, and natural killer cells) and improving the anti-tumor effects of adaptive immune cells (e.g., CD8+ and CD4+ T cells) (Pizzo et al., 2022; Zhang & Xu, 2023). Prevotella9 is an anaerobic Gram-negative bacterium of the genus Prevotella (Tett et al., 2021). Prevotella9 may influence cell proliferation and apoptosis through its metabolites (e.g., SCFAs), thereby promoting or suppressing cancer development. Parabacteroides is an anaerobic, Gram-negative genus belonging to the phylum Bacteroidetes, class Bacteroidia, and order Bacteroidales. This bacterium may modulate the host immune system and alter the TME (Zhou et al., 2024), thereby affecting cancer progression. In this study, we identified certain gut microbial taxa, including Prevotella9 and Parabacteroides, which may promote lung adenocarcinoma. These taxa likely contribute to LUAD via metabolite-driven immune dysregulation in the tumor microenvironment, as suggested by previous studies on their modulation of SCFAs and host immune responses.
Furthermore, this study explored potential causal genes associated with lung adenocarcinoma. To better understand the biological functions of these genes in lung adenocarcinoma, we performed GO and KEGG analyses. Building on these microbial associations, our gene mapping and enrichment analyses elucidate downstream pathways linking gut dysbiosis to LUAD. We found that the detrimental gut microbiota may act through the regulation of the inner arm dynein complex assembly, flagellar microtubule motor complex assembly, positive regulation of phagocytosis, flagellar microtubule organization, and the regulation of phagocytosis. Single-cell analysis revealed that PDE10A was upregulated in endothelial cells but downregulated in macrophages of lung adenocarcinoma patients. DOCK2 was upregulated in monocytes, endothelial cells, and pre-B CD34- cells but downregulated in NK cells, T cells, B cells, and epithelial cells. Additionally, SUOX was upregulated in epithelial cells, and BCAS3 was upregulated in smooth muscle cells. CAMK1D was upregulated in monocytes, epithelial cells, endothelial cells, and pre-B CD34- cells but downregulated in T and B cells. The remaining microbiota-associated genes that did not overlap with DEGs in LUAD may still contribute to disease risk through subtler transcriptional changes or post-transcriptional mechanisms, and warrant future investigation.
Among the five LUAD candidate genes identified via Mendelian randomization and transcriptomic analyses (CAMK1D, BCAS3, DNAH1, PDE10A, and C18orf63), only CAMK1D exhibited consistently high expression across single-cell RNA-seq, bulk RNA-seq, and qRT-PCR datasets. This cross-platform consistency underscores the robustness of CAMK1D as a candidate gene and provides a strong rationale for its prioritization in subsequent mechanistic and network analyses to evaluate its potential as a therapeutic target. CAMK1D has been shown to inhibit glioma growth by modulating the PI3K/AKT/mTOR pathway, a central signaling cascade that regulates tumor cell proliferation and survival (Jin et al., 2022). In lung cancer, circPRKCI promotes the malignant phenotype of LUAD cells via the miR-219a-5p/CAMK1D axis, suggesting that CAMK1D can act as a downstream effector of oncogenic non-coding RNA regulation (Sui et al., 2021). In addition, CAMK1D is co-expressed with PD-L1 in anti-PD-L1/PD-1-refractory tumors, where it is activated upon Fas receptor stimulation and subsequently phosphorylates caspases-3, -6, and -7, thereby inhibiting their activation and blunting apoptosis (Volpin et al., 2020). This mechanism provides a plausible link between CAMK1D and immune evasion in the LUAD tumor microenvironment, consistent with our observation that CAMK1D is upregulated in monocytes, epithelial cells, endothelial cells, and pre-B CD34^−^ cells but downregulated in T and B cells. Finally, CAMK1D has been implicated in CREB-dependent transcription and metabolic regulation (Vivot et al., 2023) , which may further influence LUAD cell growth and adaptation to the tumor microenvironment. Taken together, these findings indicate that CAMK1D may promote LUAD progression through multiple pathways, including PI3K/AKT/mTOR signaling, non-coding RNA axes, suppression of apoptosis, and immune modulation.
We explored the function and mechanism of action of CAMK1D in LUAD and its potential links to the gut microbiome. By constructing a PPI network, we identified nine interacting proteins (CALM3, CALML3, CALML4, CALML5, CALML6, CREB1, NOS3, CDC123 and DCX) that may be key to the mechanism by which CAMK1D promotes the development and progression of lung adenocarcinoma. Additionally, we predicted 42 transcription factors. WikiPathways enrichment analysis of the CAMK1D-centered PPI network revealed that these core genes were primarily enriched in the RAAS. Angiotensin peptides play an important role in cell proliferation, immune-inflammatory responses, hypoxia, and angiogenesis, which are key biological processes in the lung cancer microenvironment (Catarata et al., 2020). A large body of preclinical and clinical data suggests that RAAS inhibitors could play a role in lung cancer treatment (Kocher et al., 2021; Rachow, Schiffl & Lang, 2021; Shen et al., 2016). Thus, targeting CAMK1D within this network may offer synergistic benefits with microbiota-modulating therapies, such as probiotics, to mitigate LUAD progression. Our research revealed a complex gene association network related to the influence of the gut microbiome on lung adenocarcinoma, involving multiple genes in disease pathogenesis. However, additional in vitro, in vivo, and clinical studies are required to validate the mechanism of action of the CAMK1D gene and its therapeutic potential in lung adenocarcinoma.
While our study provided significant findings, several limitations warrant careful consideration (Flatby et al., 2023; Sanderson, 2021; Yang et al., 2024). 1. Limitations of the MR study design: Although we employed a MR study design to minimize the influence of confounding factors, the possibility of other unexplained variables affecting the results cannot be entirely excluded. While MR studies are effective in suggesting causal relationships, they do not definitively establish direct causation. Therefore, further clinical studies are necessary to validate the association between the gut microbiome and lung adenocarcinoma. 2. Data analysis constraints: The technical limitations and interpretation challenges inherent in single-cell transcriptomics and bulk RNA sequencing data present significant hurdles. These challenges include data denoising, cell-type identification and annotation, data normalization, and the application of appropriate statistical methods for differential analysis. Because we used a suggestive p-value threshold (p <1.0 × 10^−^^5^) to construct microbiota instruments, our findings should be interpreted with caution, although the consistently high F-statistics and concordant results from multiple sensitivity analyses support the robustness of the main causal estimates. 3. Limitations in predicting potential therapeutic targets: While we have preliminarily validated gene expression at the cellular level, additional functional experiments—such as in vitro knockdown or overexpression of CAMK1D, microbiota manipulation, and in vivo LUAD models (e.g., xenograft or genetically engineered mouse models)—will be crucial to directly test the mechanistic role of CAMK1D and the microbiota–gut–lung axis in tumor initiation and progression. Despite the valuable insights provided by this study, it is crucial to interpret our findings within the context of these limitations. Future research should prioritize in vivo models and longitudinal clinical trials to translate these findings into personalized interventions.
In conclusion, our study combined MR and bioinformatics analysis to explore the primary mechanisms by which the gut microbiome influences lung adenocarcinoma. This approach enabled us to successfully identify gut microbial features associated with lung adenocarcinoma. Additionally, we identified the genes associated with lung adenocarcinoma using IVs (SNPs). This suggests that the gut microbiome influences lung adenocarcinoma pathogenesis by modulating these genes.
Conclusions
Our study provides genetic evidence for a potential causal link between gut microbiota (Prevotella9 and Parabacteroides) and lung adenocarcinoma (LUAD) pathogenesis through bidirectional Mendelian randomization analysis. By integrating multi-omics approaches—including GWAS, single-cell transcriptomics, and bulk RNA sequencing—we identified CAMK1D as a central hub gene dysregulated in LUAD, with validation confirmed via qRT-PCR across multiple cell lines. Mechanistically, our analyses suggest that CAMK1D may mediate tumor-promoting effects through interactions with the RAAS pathway and may contribute to immunosuppression within the tumor microenvironment, particularly in T/B lymphocytes. These findings not only elucidate a microbiota-gut-lung axis in LUAD development but also nominate CAMK1D as a promising candidate biomarker and potential therapeutic target for precision oncology. Future studies should focus on validating CAMK1D-directed therapies and exploring microbiota modulation strategies for LUAD intervention.
Supplemental Information
10.7717/peerj.20985/supp-1Supplemental Information 1MR-PRESSO analysis for the association between gut microbiota and outcomes
10.7717/peerj.20985/supp-2Supplemental Information 2The heterogeneity of gut microbiota instrumental variables
10.7717/peerj.20985/supp-3Supplemental Information 3MR estimates for the association between gut microbiota and lung adenocarcinoma
10.7717/peerj.20985/supp-4Supplemental Information 4The pleiotropy of gut microbiota instrumental variables (after filtering using the IVW method)
10.7717/peerj.20985/supp-5Supplemental Information 5MR estimates for the association between gut microbiota and survival of lung adenocarcinoma
10.7717/peerj.20985/supp-6Supplemental Information 6The correspondence between SNP and gene
10.7717/peerj.20985/supp-7Supplemental Information 7Identify of differentially expressed genes in the GSE229705 dataset (based on p-value ¡ 0.01 and —logFC—¿0.5)
10.7717/peerj.20985/supp-8Supplemental Information 8Protein-protein interaction (PPI) network protein relevance values predicted by STRING
10.7717/peerj.20985/supp-9Supplemental Information 9Principal component and pseudotime analyses of single-cell RNA-seq data from normal lung and LUAD tissues (GSE131907)(A) Selection of highly variable genes in normal and tumor samples based on the relationship between average expression and standardized variance; the top 1,500 variable genes were used for downstream analyses. (B) Top loading genes for the first four principal components (PC1–PC4) in normal and tumor datasets, highlighting major drivers of transcriptional heterogeneity. (C) Heatmaps showing the expression patterns of high-loading genes across PCs 1–4 in normal and tumor tissues, illustrating distinct gene programs captured by each PC. (D) JackStraw plots assessing the statistical significance of PCs in normal and tumor cells; PCs with significant enrichment (p ¡ 0.05) were retained for clustering and dimensionality reduction. (E) Heatmaps of PC scores across cells in normal and tumor samples, used to confirm the separation of major cell populations along the selected PCs. (F) Monocle-based pseudotime trajectories for normal and tumor datasets, with cells ordered along developmental paths and colored by state and by major cell type (including NK cells, T cells, B cells, monocytes/macrophages, fibroblasts, epithelial cells, endothelial cells, and pre-B CD34-cells), illustrating lineage relationships and dynamic transcriptional changes in the LUAD microenvironment.
10.7717/peerj.20985/supp-10Supplemental Information 10Raw data of RT-qPCR
10.7717/peerj.20985/supp-11Supplemental Information 11Raw data of Code
10.7717/peerj.20985/supp-12Supplemental Information 12MIQE checklist
10.7717/peerj.20985/supp-13Supplemental Information 13STROBE Checklist
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Agrawal A Balci H Hanspers K Coort SL Martens M Slenter DN Ehrhart F Digles D Waagmeester A Wassink I Abbassi-Daloii T Lopes EN Iyer A Acosta JM Willighagen LG Nishida K Riutta A Basaric H Evelo CT Willighagen EL Kutmon M Pico AR 2024 Wiki Pathways 2024: next generation pathway database Nucleic Acids Research 52D 679D 68910.1093/nar/gkad 96037941138 PMC 10767877 · doi ↗ · pubmed ↗
- 2Aran D Looney AP Liu L Wu E Fong V Hsu A Chak S Naikawadi RP Wolters PJ Abate AR Butte AJ Bhattacharya M 2019 Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage Nature Immunology 2016317210.1038/s 41590-018-0276-y 30643263 PMC 6340744 · doi ↗ · pubmed ↗
- 3Barta JA Powell CA Wisnivesky JP 2019 Global epidemiology of lung cancer Annals of Global Health 85810.5334/aogh.241930741509 PMC 6724220 · doi ↗ · pubmed ↗
- 4Bouras E Karhunen V Gill D Huang J Haycock PC Gunter MJ Johansson M Brennan P Key T Lewis SJ Martin RM Murphy N Platz EA Travis R Yarmolinsky J Zuber V Martin P Katsoulis M Freisling H Nost TH Schulze MB Dossus L Hung RJ Amos CI Ahola-Olli A Palaniswamy S Mannikko M Auvinen J Herzig KH Keinanen-Kiukaanniemi S Lehtimaki T Salomaa V Raitakari O Salmi M Jalkanen S Consortium P Jarvelin MR Dehghan A Tsilidis KK 2022 Circulating inflammatory cytokines and risk of five cancers: a Mendelian randomization analysis BMC Medicine 20310.1186/s 1291 · doi ↗ · pubmed ↗
- 5Bowden J Davey Smith G Burgess S 2015 Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression International Journal of Epidemiology 4451252510.1093/ije/dyv 08026050253 PMC 4469799 · doi ↗ · pubmed ↗
- 6Bowden J Davey Smith G Haycock PC Burgess S 2016 Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator Genetic Epidemiology 4030431410.1002/gepi.2196527061298 PMC 4849733 · doi ↗ · pubmed ↗
- 7Bowden J Holmes MV 2019 Meta-analysis and Mendelian randomization: a review Research Synthesis Methods 1048649610.1002/jrsm.134630861319 PMC 6973275 · doi ↗ · pubmed ↗
- 8Burgess S Small DS Thompson SG 2017 A review of instrumental variable estimators for Mendelian randomization Statistical Methods in Medical Research 262333235510.1177/096228021559757926282889 PMC 5642006 · doi ↗ · pubmed ↗
