Screening and identification of key serum biomarkers between PM2.5 and-induced asthma onset
Yong Tang, Liangjun Li, Wenjie Hao, Hongyan Xu, Xiangyun Deng

TL;DR
This study identifies five key genes that may explain how PM2.5 pollution contributes to asthma development.
Contribution
The study introduces five hub genes that mediate the link between PM2.5 and asthma onset using machine learning and gene analysis.
Findings
715 differentially expressed genes were identified, with 118 in a significant asthma-related module.
Five hub genes (CEBPE, HDC, IRAK3, PRR4, SOD2) were confirmed as independent predictors of asthma.
These genes showed significant mediation between PM2.5 levels and asthma onset.
Abstract
The mechanism by which air pollution causes the onset of asthma is complex, and its key targets have not yet been fully identified. In this study, we identified the factors that mediate the relationship between air pollution and asthma. We screened overlapping genes related to asthma from the Gene Expression Omnibus (GEO) database by integrating differentially expressed genes (DEGs) and weighted gene co-expression network analysis (WGCNA). To further identify hub genes, we used three machine learning methods: least absolute shrinkage and selection operator (LASSO) regression, support vector machine-recursive feature elimination (SVM-RFE), and random forest (RF). We subsequently analyzed the mediating role of these hub genes in the association between PM2.5 and onset of asthma in a study population consisting of 160 participants with asthma and 160 participants without asthma. The…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8| Datasets | Time | Asthma group | Control group | Platforms | Organism | Experiment type |
|---|---|---|---|---|---|---|
| Jan 22, 2015 | 36 | 33 | Homo sapiens | Array | ||
| Jul 19, 2019 | 46 | 21 | Homo sapiens | Array | ||
| Sep 11, 2019 | 54 | 15 | Homo sapiens | Array |
| Factors | Indicator | Control group ( | Asthma group ( | χ2/ | |
|---|---|---|---|---|---|
| Age (year) | 55.11 ± 12.09 | 56.19 ± 11.71 | 0.812 | 0.418 | |
| Sex | Male | 75 (46.875) | 83 (51.875) | 0.800 | 0.371 |
| Female | 85 (53.125) | 77 (48.125) | |||
| BMI (kg/m2) | 23.89 ± 2.62 | 23.33 ± 3.09 | 1.748 | 0.081 | |
| Moking history | Yes | 50 (31.250) | 55 (34.375) | 0.354 | 0.552 |
| No | 110 (68.750) | 105 (65.625) | |||
| Family history | Yes | 33 (20.625) | 21 (13.125) | 3.208 | 0.073 |
| No | 127 (79.375) | 139 (86.875) | |||
| Diabetes | Yes | 79 (49.375) | 92 (57.500) | 2.123 | 0.145 |
| No | 81 (50.625) | 68 (42.500) |
| Factors |
| SE | Wald | Exp( | Exp( | ||
|---|---|---|---|---|---|---|---|
| Down | Up | ||||||
| CEBPE | 2.640 | 0.264 | 100.031 | <0.001 | 14.018 | 8.355 | 23.517 |
| HDC | 2.470 | 0.249 | 98.412 | <0.001 | 11.817 | 7.254 | 19.248 |
| IRAK3 | 2.811 | 0.304 | 85.217 | <0.001 | 16.620 | 9.151 | 30.185 |
| PRR4 | 2.231 | 0.222 | 101.424 | <0.001 | 9.311 | 6.031 | 14.374 |
| SOD2 | 3.112 | 0.426 | 53.363 | <0.001 | 22.470 | 9.749 | 51.790 |
| Factors |
| SE | Wald | Exp( | Exp( | ||
|---|---|---|---|---|---|---|---|
| Down | Up | ||||||
| CEBPE | 0.940 | 0.417 | 5.086 | 0.024 | 2.560 | 1.131 | 5.793 |
| HDC | 0.860 | 0.435 | 3.915 | 0.048 | 2.364 | 1.008 | 5.544 |
| IRAK3 | 0.918 | 0.417 | 4.855 | 0.028 | 2.504 | 1.107 | 5.665 |
| PRR4 | 0.946 | 0.434 | 4.744 | 0.029 | 2.576 | 1.099 | 6.036 |
| SOD2 | 3.204 | 0.928 | 11.922 | 0.001 | 24.629 | 3.996 | 151.797 |
| Factors |
| SE | Wald | Exp( | Exp( | ||
|---|---|---|---|---|---|---|---|
| Down | Up | ||||||
| CEBPE | 0.895 | 0.377 | 5.641 | 0.018 | 2.448 | 1.169 | 5.123 |
| HDC | 1.066 | 0.405 | 6.934 | 0.008 | 2.904 | 1.313 | 6.422 |
| IRAK3 | 0.899 | 0.380 | 5.580 | 0.018 | 2.456 | 1.165 | 5.178 |
| PRR4 | 1.000 | 0.436 | 5.271 | 0.022 | 2.719 | 1.158 | 6.387 |
| SOD2 | 3.286 | 1.059 | 9.630 | 0.002 | 26.740 | 3.356 | 213.087 |
| Item | ACME_estimate | ACME_ | ACME_CI | ADE_estimate | ADE_ | Total_effect | Prop_mediated | Mediation_type |
|---|---|---|---|---|---|---|---|---|
| PM2.5= > CEBPE= > Asthma | 0.288 | <0.001 | [0.214, 0.376] | 0.209 | <0.001 | 0.497 | 0.58 | Partial mediation |
| PM2.5= > HDC= > Asthma | 0.283 | <0.001 | [0.215, 0.360] | 0.214 | <0.001 | 0.497 | 0.569 | Partial mediation |
| PM2.5= > PRR4= > Asthma | 0.248 | <0.001 | [0.186, 0.321] | 0.249 | <0.001 | 0.497 | 0.5 | Partial mediation |
| PM2.5= > IRAK3= > Asthma | 0.202 | <0.001 | [0.144, 0.266] | 0.295 | <0.001 | 0.497 | 0.406 | Partial mediation |
| PM2.5= > SOD2= > Asthma | 0.153 | <0.001 | [0.108, 0.197] | 0.344 | <0.001 | 0.497 | 0.307 | Partial mediation |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Quality and Health Impacts · Asthma and respiratory diseases · IL-33, ST2, and ILC Pathways
Introduction
1
Asthma is a common chronic inflammatory airway disease that is characterized primarily by recurrent narrowing and spasms of the respiratory tract, resulting in symptoms such as wheezing, shortness of breath, chest tightness, and coughing (1). The pathogenesis of asthma is quite complex and involves multiple factors. Among these factors, genetic factors are an important reason for its onset. Environmental factors also strongly influence the occurrence of the disease (2, 3). Epidemiological studies have reported that due to rapid industrialization and urbanization, the problem of air pollution has become increasingly severe. Long-term exposure to polluted air can increase the risk of respiratory diseases, especially asthma (4–6). The airway sensitivity of asthma patients who are exposed to polluted air for a long period may further increase. Even low-concentration irritants may trigger asthma attacks, increasing the frequency and severity of asthma attacks. Environmental pollution also encompasses exposure to allergens, respiratory tract infections, and climate change, among other factors (7, 8). These factors can interact with air pollution, further influencing the onset of asthma. However, the specific molecular mechanisms by which air pollution leads to the onset of asthma need to be elucidated.
To understand the intrinsic connection between environmental air pollution and the onset of asthma, many studies have been conducted (9, 10). Among these studies, the use of serum markers as intermediary variables has received considerable attention. Recently, Nahid Mostafavi et al. provides supportive evidence for a mediating effect of the immune system in the association between air pollution and adult-onset asthma (11). Furthermore, several studies also have investigated the significant role of genes in this process to understand how changes of gene expression interacts with environmental factors to jointly affect the occurrence and development of asthma (12, 13). From a prospective observational cohort study and a randomized, double-blind, placebo-controlled trial in an independent cohort, Matthew C Altman’s study demonstrated that individual pollutants were significantly associated with altered gene expression in coordinated inflammatory pathways, including PM2·5 with increased epithelial induction of tissue kallikreins, mucus hypersecretion, and barrier functions and O3 with increased type-2 inflammation (14). Therefore, the identification of mediators is helpful to the prevention of asthma onset and early warning of asthma attack. In addition, it also has guiding significance for formulating relevant public health policies, so as to better protect vulnerable populations.
In this study, we assessed the roles of genes in the association between PM2.5 and onset of asthma. For this, we used the Gene Expression Omnibus (GEO) database to identify hub genes dysregulated in asthma through differential expression gene (DEG) analysis, weighted gene co-expression network analysis (WGCNA), and machine learning methods. Data were collected from 160 participants with asthma and 160 participants without asthma. Univariate and multivariate logistic analyses were conducted to screen for factors influencing the onset of asthma. The mediating role of the hub genes in the association between PM2.5 and onset of asthma was subsequently evaluated using the bootstrap method. This study provided new theoretical insights into the pathogenesis of asthma and identified therapeutic targets for future intervention strategies.
Methods
2
Data collection and preprocessing
2.1
Gene expression datasets were retrieved from the GEO database.1 Three independent datasets (GSE65204, GSE134544, and GSE137268) were included in this study. The datasets were selected based on the following criteria: (1) publicly available in the GEO database; (2) derived from human subjects; (3) contained information on both asthma and healthy control groups; (4) included airway-relevant or immunologically informative sample types; (5) had well-defined sample sizes and annotations. The raw expression data were normalized and preprocessed using the limma package in R (version 4.2.1). Probes were annotated according to platform-specific annotation files, and duplicates were collapsed according to the median expression value. The dataset information was shown in Table 1.
Differential expression analysis
2.2
To identify differentially expressed genes (DEGs) between the asthma and control groups in the dataset GSE65204, we used the empirical Bayes method implemented in the limma package. Genes with an absolute log_2_-fold change (|log₂FC|) > 0.25 and p < 0.05 were considered to be significant. In order to screen more DEGs for subsequent analysis, we used p value instead of adjusted p values. DEGs were visualized using volcano plots and heatmaps generated using the ggplot2 and pheatmap packages, respectively.
Weighted gene co-expression network analysis
2.3
To identify gene modules associated with asthma, WGCNA was performed on the GSE137268 dataset using the WGCNA package. After filtering for the top 5,000 most variable genes based on median absolute deviation (MAD), a scale-free topology model was constructed. The soft-thresholding power was determined using the pickSoftThreshold function. Modules were identified by average linkage hierarchical clustering and dynamic tree cutting with a minimum module size of 30. Module-trait relationships were calculated by correlating module eigengenes with the status of asthma. Modules showing the strongest correlation were retained for downstream analysis.
Machine learning for biomarker selection
2.4
Three machine learning algorithms were applied to the 18 intersecting candidate genes using the GSE134544 dataset, including least absolute shrinkage and selection operator (LASSO) regression, support vector machine-recursive feature elimination (SVM-RFE), and random forest (RF). LASSO regression was performed using the glmnet package, with 10-fold cross-validation to select the optimal lambda. SVM-RFE was conducted using the caret package, and the top-ranked features based on cross-validation accuracy were retained. RF analysis was performed using the randomForest package, and genes with an importance score >1.5 were selected. Candidate genes identified using all three algorithms were considered to be robust diagnostic biomarkers.
Gene set enrichment analysis
2.5
To investigate the biological functions of the identified biomarkers, GSEA was performed on each gene using the GSE65204 dataset. For each gene, samples were divided into high-expression and low-expression groups based on median expression. KEGG pathway enrichment was conducted using the GSEA software (version 4.1.0), with 1,000 permutations. Significantly enriched pathways were defined by a nominal p < 0.05 and FDR < 0.25. Enrichment plots were generated to visualize the top-ranked pathways for each biomarker.
Participants
2.6
This study was conducted at Jiayuguan First People’s Hospital and approved by the Ethics Committee of Jiayuguan First People’s Hospital. All participants provided signed informed consent. From January 2021 to December 2024, 320 participants were signed up, including 160 participants with asthma and 160 participants without asthma. The study was conducted according to the Declaration of Helsinki, and its content and procedures complied with institutional ethical committee standards. This study was conducted after it was approved. Participants who met the criteria were recruited and evaluated. The inclusion criteria included having resided in the city for five consecutive years and having no history of other significant cardiopulmonary diseases. All participants voluntarily provided signed informed consent. The exclusion criteria were patients with pulmonary nodules, bronchiectasis, chronic obstructive pulmonary disease, chronic bronchitis, cystic fibrosis, or lung cancer; individuals with altered mental status, consciousness disorders, or other psychiatric conditions; those with hematologic malignancies; and individuals unwilling to participate in the study or those with incomplete clinical data. The patient meets the diagnostic criteria of the Guidelines for bronchial asthma prevent and management (2020 edition) (15). The diagnostic criteria included: ① Repeated wheezing, coughing, shortness of breath and chest tightness are often associated with exposure to allergens, cold air, physical and chemical substances. It is related to irritation, respiratory tract infection, exercise and hyperventilation, etc., and often occurs or worsens at night and/or in the early morning. ② During an attack, scattered or diffuse wheezing sounds mainly in the expiratory phase can be heard in both lungs, with the expiratory phase prolonged. ③ The above symptoms and signs are effective with anti-asthma treatment or resolve spontaneously. ④ Excluding wheezing, coughing, shortness of breath and chest tightness caused by other diseases. For those with atypical clinical manifestations (such as no obvious wheezing or wheezing sounds), at least one of the following should be present: (1) Confirmed reversible airflow limitation: (a) Positive bronchial dilation test: Fifteen minutes after inhaling a fast-acting ß2 receptor agonist (such as salbutamol pressure quantified aerosol 200–400 μg), the forced expiratory volume in the first second (FEV1) increased by more than 12% and its absolute value increased by more than 200 mL. (b) Improvement of pulmonary ventilation function after anti-inflammatory treatment: After 4 to 8 weeks of treatment with inhaled glucocorticoids and/or anti-leukotriene drugs, FEV1 increases by more than 12%; (2) Positive bronchial provocation test; (3) The daytime variation rate or diurnal fluctuation rate of peak expiratory flow (PEF) is greater than 20%. Those who meet the conditions of items 1–4 or items 4 and 5 can be diagnosed with asthma.
Air pollution status
2.7
This study referred to the basic standards specified in the “Environmental Air Quality Standard GB3095-2012” and obtained monitoring data of particulate matter 2.5 (PM2.5) from the local environmental monitoring department for the residential areas of the participants within the corresponding period. The risk of asthma in this region was assessed based on the obtained air pollutant concentrations (16).
Blood sample collection
2.8
Blood samples (5 mL) from patients with asthma were collected in vacuum coagulant tubes within 1 day of asthma exacerbation. We collected 5 mL of fasting elbow vein blood from participants without asthma in the morning. After centrifugation at 4 °C for 15 min at 3,000 rpm, serum samples were collected in RNase/DNase-free tubes and immediately frozen at −80 °C.
RNA extraction and quantitative real-time polymerase
2.9
Total RNA was extracted from the serum samples using TRIzol reagent (Takara, Dalian, China) following the manufacturer’s protocol. Then, the RNA was reverse-transcribed to cDNA using a PrimerScript RT-PCR kit (Takara). Real-time qPCR was performed using a standard SYBR Green PCR kit (Takara) following the manufacturer’s protocol. The relative expression of genes was calculated using the 2^–ΔΔCt^ method (17). GAPDH was used as an internal control. The sequences of primers used were as follows: for CCAAT/enhancer binding protein ε (CEBPE), forward: 5′- ATC TCT TTG CCG TGA AGC CA − 3′, reverse: 5′- TCT GCT GCG TCT CCA GAA TG − 3′; for histidine decarboxylase (HDC), forward: 5′- CCA TCT GTG CCC GTG AGG-3′, reverse: 5′- CGA AAA ACC ACC AGG CCA AG-3′; for interleukin-1 receptor-associated kinase 3 (IRAK3), forward: 5′- AGG ATT TCC GCG GTT GTG TA-3′, reverse: 5′- TCG ATG TCC CAT CTC CT-3′; for proline-rich protein 4 (PRR4), forward: 5′-TGT GTC CTC ACC CAC TGT-3′, reverse: 5′- GAG AGT TGA CGG TGT CCT CG-3′. For manganese superoxide dismutase (SOD2), forward: 5′- TCT GGC CCA CTC ACA GG AG − 3′, reverse: 5′- CTC GGT GAC GTT CAG GTT GT − 3′; for glyceraldehyde-3-phosphate dehydrogenase (GAPDH), forward: 5′- GGA TTT GGT CGT ATT GGG CG − 3′, reverse: 5′- TCC CGT TCT CAG CCA TGT AG − 3′.
Mediation analysis
2.10
Mediation analysis was conducted usingR software with mediation package. First, a basic regression model was constructed with air pollution level as the independent variable and asthma incidence as the dependent variable, and the direct association between air pollution and asthma was analyzed. Then, genes were introduced into the model one by one, and their mediating effects were tested as mediator variables using bootstrap method with 1,000 bootstrap samples. When the Average Causal Mediation Effect (ACME) p < 0.05 and the Average Direct Effect (ADE) p > 0.05, it indicates a complete mediating effect. When the ACME p value < 0.05 and the ADE p < 0.05, it indicates a partial mediating effect. When the ACME p > 0.05 and the ADE p > 0.05, it indicates no mediating effect (Figure 1).
Overall flowchart of the study.
Statistical analysis
2.11
The data of the participants were compiled and analyzed using Statistical Product and Service Solutions (SPSS, IBM Corp., Armonk, NY, United States), version 23.0. The general data of the two groups were compared by conducting the Chi-square test or the Wilcoxon signed-rank test. The Pearson correlation coefficient was used to evaluate the relationship between the PM2.5 concentration and gene expression level. Univariate and multivariate logistic regression analyses were conducted to screen for factors related to asthma. All results were considered to be statistically significant at p < 0.05.
Results
3
Identification of differentially expressed genes
3.1
To identify DEGs between asthma patients and healthy controls, we analyzed the gene expression profiles from the GSE65204 dataset obtained from the GEO database. The DEGs were analyzed using the limma package, with the thresholds set at p < 0.05 and |log₂FC| > 0.25. In the GSE65204 dataset, 715 DEGs were identified, including 363 upregulated and 352 downregulated genes. A volcano plot was used to highlight these DEGs (Figure 2A), and the expression profiles of the top 100 DEGs were visualized using a heatmap (Figure 2B), which revealed distinct clustering between the asthma and control samples.
Differentially expressed genes (DEGs) and WGCNA of the GSE65204 dataset. (A) The volcano plot of GSE65204 shows DEGs with thresholds of |log2FC| > 0.25 and p < 0.05. Red, upregulated genes; green, downregulated genes; black, non-significant genes. (B) The heatmap illustrates the top 100 DEGs in GSE65204. (C) Sample dendrogram and trait heatmap. (D,E) Determination of soft-thresholding power based on scale-free topology (left) and mean connectivity (right). (F) Gene dendrogram with module colors after dynamic tree cut. (G) Module-trait relationships showed the correlation between module eigengenes and the status of asthma. (H) Identification of co-expression modules associated with asthma. (I) The Venn diagram shows 18 overlapping genes from the DEGs and WGCNA modules.
Weighted gene co-expression network analysis
3.2
To assess gene co-expression patterns further and identify modules associated with asthma, we performed WGCNA on the GSE137268 dataset. Hierarchical clustering of the samples and a trait heatmap confirmed the consistency of the sample grouping (Figure 2C). A soft-thresholding power of 12 was selected based on the scale-free topology criterion and mean connectivity (Figures 2D,E). Using the WGCNA package, a systematic clustering tree was constructed. In Figure 2F, each short vertical line represents a gene, and each color represents one module composed of genes with similar expression patterns. Three modules were identified (Figure 2G), among which the MEgrey module was most significantly associated with asthma (cor = 0.37, p < 0.001), as shown in the module-trait heatmap (Figure 2H). To refine potential diagnostic biomarkers, an integrative analysis was performed using the DEGs and the genes from the WGCNA grey modules. A total of 19 overlapping genes were identified and considered core asthma-related candidates (Figure 2I).
Identification of candidate genes via machine learning
3.3
The 19 identified genes underwent machine learning-based feature selection using the GSE134544 dataset. To conduct LASSO regression analysis, the optimal lambda value was determined by 10-fold cross-validation (Figure 3A), resulting in the selection of 7 genes with non-zero coefficients (Figure 3B). These genes were considered to have the strongest penalized regression association with the status of asthma. In the RF model, variable importance was computed using the Gini index, and 12 genes with importance scores >1.5 were retained for further analysis (Figures 3C,D), representing the most informative predictors of the classification of asthma. For SVM-RFE, recursive feature elimination was performed via cross-validation with an accuracy-based ranking criterion, and the top 10 ranked genes with AvgRank<10 were selected as candidate features (Figures 3E,F). Furthermore, we performance ROC analysis for the machine learning models to demonstrate their predictive validity. The AUC of ROC analysis was 0.924 (0.869–0.978) for LASSO regression analysis, 1 (1−1) for RF model, and 0.912 (0.826–0.998) for SVM-RFE, which exceeded 0.5 (Supplementary Figure S1). Finally, five genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) were identified as overlapping features across all three algorithms, as illustrated in the Venn diagram (Figure 3G), and were thus considered robust candidate biomarkers for asthma.
Candidate diagnostic biomarkers were identified via machine learning in the GSE134544 dataset. (A) Optimal lambda selection in LASSO regression. (B) LASSO coefficient profiles. (C,D) Random forest model: error rate plot and gene importance ranking. (E,F) SVM-RFE: cross-validation error and accuracy for different feature subsets. (G) The Venn diagram shows features selected by LASSO, RF, and SVM-RFE.
Gene set enrichment analysis of candidate biomarkers
3.4
To elucidate the biological roles of the five candidate biomarkers (CEBPE, HDC, IRAK3, PRR4, and SOD2), GSEA was performed on the GSE65204 dataset. The analysis revealed that these genes were consistently enriched in several immune-related and metabolic pathways, highlighting their functional involvement in asthma pathogenesis. All genes were enriched in asthma, while most genes were involved in immune regulatory pathways and immune-related diseases, including allograft rejection, autoimmune thyroid disease, primary immunodeficiency, and graft-versus-host disease. We found that IRAK3 and SOD2 were enriched in the regulation of cytochrome P450. Most genes were linked primarily to metabolism, such as glycine, serine, and threonine metabolism; taurine and hypotaurine metabolism; and drug metabolism, such as cytochrome P450 (Figure 4). These findings collectively support the functional relevance of the five candidate genes and suggest that they may contribute to asthma.
Gene set enrichment analysis (GSEA) of candidate diagnostic biomarkers in the GSE65204 dataset. (A–E) GSEA plots for CEBPE, HDC, IRAK3, PRR4, and SOD2, respectively. The samples were divided into high-expression and low-expression groups, and KEGG pathways were ranked based on enrichment scores. The top enriched pathways are indicated in each plot.
Comparison of the baseline characteristics between the two groups
3.5
A total of 160 participants with asthma and 160 participants without asthma were screened and included. The two groups showed no significant differences in baseline characteristics, such as age structure, sex ratio, BMI, smoking history, education level, family history, or complications (all p > 0.05), indicating good comparability. The baseline characteristics of the two groups are presented in Table 2.
Abundance of candidate genes in the blood
3.6
To determine the clinical significance of the candidate genes, we collected blood from participants with/without asthma for validation via qRT-PCR. The results of qRT-PCR revealed that the expression of three genes of interest, including CEBP, was significantly greater in asthma patients than in participants without asthma (p < 0.001) (Figure 5A). The HDC levels in the serum of asthma patients were significantly greater than those in the serum of participants without asthma (p < 0.001, Figure 5B). Similar patterns were also observed for IRAK3, PRR4, and SOD2 (p < 0.001) (Figures 5C–E). The relationships between gene expression levels and PM2.5 levels were subsequently evaluated using the Pearson correlation coefficient. The results revealed a significant positive correlation between CEBPE and PM2.5 levels (Figure 6A). Similar patterns were found for HDC (Figure 6B), IRAK3 (Figure 6C), PRR4 (Figure 6D), and SOD2 (Figure 6E).
*The abundance of genes in the serum of asthma patients and participants without asthma. The genes included CEBPE (A), HDC (B), PRR4 (C), SOD2 (D), and IRAK3 (E). **p < 0.001.
Relationships between genes and PM2.5 levels. The genes included CEBPE (A), HDC (B), PRR4 (C), SOD2 (D), and IRAK3 (E).
Univariate and multivariate logistic regression analyses
3.7
Univariate and multivariate logistic regression analyses were conducted to assess the independent predictive role of these five genes for asthma. The presence or absence of asthma was considered to be the endpoint, and the selected geneswere included as independent variables in univariate logistic regression models. These five genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) were significantly different (p < 0.05). The detailed results are presented in Table 3. To verify the relationship between these genes and asthma, multivariate logistic regression analysis was performed. The results confirmed that these five genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) were independent risk factors for asthma, and the detailed results are presented in Table 4. After adjusting for confounding factors such as gender and age, these five genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) were still independent risk factors for asthma (Table 5). Furthermore, the calibration curves of multivariate logistic regression model after adjusting for confounding factors in terms of the agreement between the observed outcomes and predicted probabilities (Figure 7A). While the DCA curve showed the large net benefits of the multivariate logistic regression model for predicting asthma (Figure 7B). The ROC curve indicated that the multivariate logistic regression model had a good distinguishing ability (Figure 7C). These findings suggest that the multivariate logistic regression model has a relatively high degree of accuracy.
The performance indicators of the multivariate logistic regression model after adjusting for confounding factors, including (A) the calibration curves, (B) the DCA curves, and (C) the ROC curve.
Significance test of the mediating effect
3.8
In this study, a mediation effect analysis was conducted to elucidate the potential mediating mechanisms between PM2.5 and the onset of asthma, involving several potential mediators, including CEBPE, HDC, IRAK3, PRR4, and SOD2. The results of the analysis indicated that partial mediation effects were observed in all mediation paths. For the path through which PM2.5 affects asthma through CEBPE, the indirect effect was 0.022, with a 95% confidence interval (CI) of (0.068, 0.161), a standard error (SE) of 0.024, a z/t value of 0.909, and a p-value of 0.363. Similarly, for the path where PM2.5 affects asthma through HDC, the indirect effect was 0.02 [95% CI (0.063, 0.148), p = 0.374], with an SE of 0.022 and a z/t value of 0.89. For the path where PM2.5 affects asthma through IRAK3, the indirect effect was 0.011 [95% CI (0.028, 0.085), p = 0.46], with an SE of 0.014 and a z/t value of 0.739. For the path where PM2.5 affects asthma through PRR4, the indirect effect was 0.012 [95% CI (0.033, 0.1), p = 0.475], with an SE of 0.017 and a z/t value of 0.714. Finally, for the path where PM2.5 affects asthma through SOD2, the indirect effect was 0.01 [95% CI (0.028, 0.074), p = 0.419], with an SE of 0.012 and a z/t value of 0.808 (Table 6 and Figure 8). These findings suggest that these variables may play a partial mediating role in the process by which PM2.5 affects the onset of asthma, providing important clues and serving as a reference for further in-depth research on the internal mechanisms by which PM2.5 induces asthma.
Relationships among PM2.5, latent variables, and asthma.
Discussion
4
With rapid industrialization and urbanization, the deterioration of air quality not only poses a threat to the health of the general population but also exacerbates the challenges faced by asthma patients, making their situation even more severe (18, 19). Studies have reported that prolonged exposure to polluted environments not only increases the incidence rate of asthma but also intensifies the symptoms of patients and increases the frequency of attacks (20). For every 10 μg increase in the PM2.5 concentration/m3 of air, there is a 3.5% increase in hospitalization rates among asthma patients (21, 22). Therefore, we aimed to identify the mediating molecules associated with air pollution and the incidence of asthma. Using GEO datasets, 715 DEGs were identified, including 363 upregulated and 352 downregulated genes. WGCNA revealed that the MEgrey module was most significantly related to asthma. A total of 19 overlapping genes were identified and subjected to machine learning-based feature selection. Finally, five hub genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) were recognized as overlapping features across all three algorithms, including LASSO regression, RF, and SVM-RFE. The results of the univariate and multivariate logistic regression analyses revealed that CEBPE, HDC, IRAK3, PRR4, and SOD2 were independent risk factors. A mediation effect analysis was conducted, and the results revealed that these five genes played a mediating role in the PM2.5 and onset of asthma.
Some studies have indicated that, besides external environmental factors, genetics (especially key genes) may play an important role in the pathogenesis of asthma (23, 24). With the advancement of gene chip technology in recent years, a large amount of sequencing data on disease onset and progression has been uploaded to gene expression databases (25). Investigating key genes to elucidate the molecular mechanisms underlying the development and progression of asthma has become a research hotspot in recent years (26). Some studies have used bioinformatics analysis methods to examine gene chips from bronchial epithelial samples of severe asthma patients and healthy controls, identifying key DEGs and elucidating their roles in major signaling pathways during severe asthma onset (27, 28). However, these studies did not address whether the DEGs are associated with the relationship between the pathogenesis of asthma and air pollution. To bridge this gap, we analyzed the mediating effects of these DEGs between the onset of asthma and air pollution. By conducting differential gene expression analysis and WGCNA, we obtained 18 asthma-related hub genes from two GEO datasets. We subsequently conducted machine learning analysis on these 19 genes using three methods to further narrow the scope of asthma-related hub genes. Five genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) were identified as overlapping features through LASSO regression, RF, and SVM-RFE. CEBP belongs to the C/EBP family and participates in the development of inflammatory diseases. Yamini V Virkud found a correlation between CEBP and immunoglobulin E levels in the “Genetics of Asthma in Costa Rica Study” and “Childhood Asthma Management Program” cohorts. Their findings indicated that CEBP may be involved in the molecular mechanism of asthma development (29). Some studies have found that HDC is involved in the regulation of goblet cell hyperplasia in allergic airway inflammation (30, 31). Through microarray screening and subsequent validation, Winter et al. reported that HDC levels were higher in eosinophilic asthma patients than in non-eosinophilic asthma patients (32), which agrees with our findings. Zhong revealed that PRR4 is a specific asthma gene through bioinformatics approaches (33). This observation is consistent with our data. Through Mendelian randomization analysis, previous studies have shown that SOD is involved in nasal mucosal biology mediated by air pollutants (34). SOD2 is correlated with a lower risk of childhood bronchial asthma (35). Consistent with our results, through machine learning, Zhang discovered that SOD is a mitochondria-related biomarker with diagnostic value for neutropenic asthma (36). Genetic variants of SOD2, which can reflect oxidative stress metabolism, might modify the association of phthalate exposure with asthma (37). IRAK3 encodes a protein that negatively regulates Toll-like receptor signaling pathways involved in innate host defense and in the control of adaptive immune responses (38). Significantly greater expression of IRAK3 was found in healthy individuals, patients with chronic persistent asthma, and patients with acute exacerbation of asthma (39). Therefore, the identification of these genes provides new insights into the molecular mechanisms underlying air pollution-induced asthma and may serve as biomarkers for the prevention and treatment of asthma. GSEA revealed that these genes are involved in the development of asthma, antigen processing and presentation, and IgA production. These findings further suggest that these genes may be involved in the occurrence of asthma caused by environmental pollution.
To determine the role of these key genes in the occurrence of asthma caused by environmental pollution, we provided strong evidence that the expression of genes of interest (CEBPE, HDC, IRAK3, PRR4, and SOD2) increased significantly in 160 participants with asthma. This finding was also similar to the results of multiple regression analysis, in which five genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) were found to be independent risk factors for asthma. After adjusting for gender and age, these five genes were still independent risk factors for asthma. Importantly, the results of DCA curve, ROC curve, and calibration curves indicate that the multivariate logistic regression model is reliable and trustworthy. Considering the significant linear correlation between the levels of these genes and PM2.5 levels, we speculate that these genes mediate the onset of asthma induced by environmental pollution. We conducted a mediation effect analysis to determine whether these genes are mediators of between PM2.5 and the onset of asthma. The results revealed that five genes of interest (CEBPE, HDC, IRAK3, PRR4, and SOD2) showed a strong mediating effect between PM2.5 and the onset of asthma, suggesting that PM2.5 may alter the expression of these genes, thereby causing the onset of asthma. We speculate that these changes may be attributed to the regulatory effects of these genes on immune signaling pathways, inflammation, etc.; GSEA revealed these regulatory effects. Moreover, CEBPE is involved in myeloid cell development and the induction of several inflammatory mediators. Knocking down CEBPE leads to the failure of functional neutrophil and eosinophil production, impairing chemotaxis and bactericidal activity (40). HDC, a histamine-producing enzyme, is involved in the late and chronic phases of allergic and nonallergic inflammation (41). An increase in HDC levels promotes the occurrence of asthma. IRAK3 plays a key role in maintaining chronic Th2 airway inflammation via the inhibition of lung dendritic cell-mediated activation of Th1 (42). PRR4 is the most important gene in regulating the Th2 endotype and exhibits the strongest correlation with resting mast cells (43). Many studies have confirmed that SOD2 regulates allergic lung inflammation by regulating the production of mitochondrial ROS (44, 45). These studies suggest that environmental air pollution may influence the risk of asthma by modulating the expression of these genes. This discovery provides new clues to better understand how air pollution affects respiratory health and informs strategies for the prevention and treatment of asthma.
This study had some limitations. First, although this study included 320 participants, the sample size may be relatively small for a specific region or the entire asthma-afflicted population, thereby limiting the generalizability and representativeness of the results. Second, although the clinical validation cohort was large enough, the participants were recruited from a single city within the population of a specific region. The differences in air pollution components and levels among different regions may limit the extrapolation of these findings to other areas. Therefore, conducting multi-center research in the future is necessary. The differences in exposure caused by factors such as occupation, commuting methods, and the use of indoor air purification equipment among individuals can lead to non-differential misclassification bias. Further in-depth analysis will still be necessary to determine the environmental pollution experienced by individuals during the exposure period, and thereby identify the mediating factors. Finally, in biological processes, complex interactions might exist between genes, and these factors were not fully considered in this study, which might have introduced biases into the results. Therefore, in-depth molecular mechanism research in the future will help explain the role of these factors between onset of asthma and PM2.5.
Conclusion
5
By conducting bioinformatics analysis, we identified five genes (CEBPE, HDC, IRAK3, PRR4, and SOD2) associated with asthma. This study provides strong evidence for the mediating effect of these genes on the association between PM2.5 and the onset of asthma. These findings not only enrich our understanding of the etiology of air pollution and the onset of asthma but also provide new information for the prevention and treatment of asthma in the future.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Lommatzsch M Brusselle GG Levy ML Canonica GW Pavord ID Schatz M . A(2)BCD: a concise guide for asthma management. Lancet Respir Med. (2023) 11:573–6. doi: 10.1016/s 2213-2600(22)00490-8, 36716752 · doi ↗ · pubmed ↗
- 2Wan R Srikaram P Guntupalli V Hu C Chen Q Gao P. Cellular senescence in asthma: from pathogenesis to therapeutic challenges. E Bio Medicine. (2023) 94:104717. doi: 10.1016/j.ebiom.2023.104717, 37442061 PMC 10362295 · doi ↗ · pubmed ↗
- 3Xing Y Leung AS Wong GW. From preschool wheezing to asthma: environmental determinants. Pediatr Allergy Immunol. (2023) 34:e 14049. doi: 10.1111/pai.14049, 38010001 · doi ↗ · pubmed ↗
- 4Melén E Zar HJ Siroux V Shaw D Saglani S Koppelman GH . Asthma inception: epidemiologic risk factors and natural history across the life course. Am J Respir Crit Care Med. (2024) 210:737–54. doi: 10.1164/rccm.202312-2249 SO, 38981012 PMC 11418887 · doi ↗ · pubmed ↗
- 5Liu K Hua S Song L. PM 2.5 exposure and asthma development: the key role of oxidative stress. Oxidative Med Cell Longev. (2022) 2022:3618806. doi: 10.1155/2022/3618806, 35419163 PMC 9001082 · doi ↗ · pubmed ↗
- 6Agache I Canelo-Aybar C Annesi-Maesano I Cecchi L Rigau D Rodríguez-Tanta LY . The impact of outdoor pollution and extreme temperatures on asthma-related outcomes: a systematic review for the EAACI guidelines on environmental science for allergic diseases and asthma. Allergy. (2024) 79:1725–60. doi: 10.1111/all.16041, 38311978 · doi ↗ · pubmed ↗
- 7Cheng PP Yu F Chen SJ Feng X Jia ZH Hu SH . PM 2.5 exposure-induced senescence-associated secretory phenotype in airway smooth muscle cells contributes to airway remodeling. Environ Pollut. (2024) 347:123674. doi: 10.1016/j.envpol.2024.123674, 38458517 · doi ↗ · pubmed ↗
- 8Piao CH Fan Y Nguyen TV Song CH Kim HT Chai OH. PM 2.5 exposure regulates Th 1/Th 2/Th 17 cytokine production through NF-κB signaling in combined allergic rhinitis and asthma syndrome. Int Immunopharmacol. (2023) 119:110254. doi: 10.1016/j.intimp.2023.110254, 37163921 · doi ↗ · pubmed ↗
