Systems-Level Transcriptomic Integration Reveals a Core Metaflammatory Network Linking Type 2 Diabetes and HBV Infection to Cholangiocarcinoma Progression
Hasan Md Rasadul, Shihui Ma, Ziqiang Ge, Rahman Md Zahidur, Pengcheng Kang, Junqi You, Jinglin Li, Chenghong Duan, Siddique A. Z. M. Fahim, Mozumder Somrat Akbor, Xudong Zhao, Yunfu Cui

TL;DR
This study identifies a shared inflammatory network linking type 2 diabetes and hepatitis B infection to bile duct cancer progression, offering new insights for prevention and treatment.
Contribution
The study reveals a conserved metaflammation gene network connecting T2D, HBV, and cholangiocarcinoma, with key hub genes linked to patient survival.
Findings
A core metaflammation network of 156 genes was identified, linking T2D, HBV, and cholangiocarcinoma.
Five key genes (IL6, TNF, AKT1, STAT3, PPARG) were found to correlate with advanced tumor stage and poor survival.
A metaflammation score based on these genes emerged as an independent prognostic factor for cholangiocarcinoma patients.
Abstract
Cholangiocarcinoma, a malignancy of the bile ducts, is associated with poor survival, and its incidence is rising globally. This trend parallels the rising epidemics of type 2 diabetes mellitus and chronic hepatitis B infection. Although these conditions are recognized risk factors for cancer, the underlying biological mechanisms remain poorly understood. In this study, we conducted an integrative analysis of genetic data from patients with these three diseases to identify potential molecular links. Our analysis revealed a shared set of 156 genes, implicating a state of chronic inflammation driven by metabolic dysregulation that connects diabetes and hepatitis B infection to cholangiocarcinogenesis. Within this network, five key genes were significantly associated with patient survival. These findings provide a molecular framework that elucidates how these risk factors contribute to…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 4
Figure 5| GO Category | GO Term | Gene Count | FDR | Enrichment | Representative Genes | |
|---|---|---|---|---|---|---|
|
| Inflammatory response | 18 | 2.3 × 10−9 | 1.2 × 10−7 | 5.2 | |
|
| Lipid metabolic process | 14 | 8.6 × 10−8 | 4.3 × 10−6 | 4.1 | |
|
| Response to cytokine | 12 | 1.8 × 10−7 | 8.9 × 10−6 | 4.5 | |
|
| Regulation of cell proliferation | 16 | 5.4 × 10−7 | 1.8 × 10−5 | 3.8 | |
|
| Cytokine activity | 9 | 4.2 × 10−8 | 2.1 × 10−6 | 6.3 | |
|
| Transcription factor binding | 11 | 1.1 × 10−6 | 5.6 × 10−5 | 4.8 | |
|
| Kinase activity | 8 | 2.0 × 10−5 | 1.0 × 10−3 | 4.2 | |
|
| Receptor binding | 15 | 4.5 × 10−5 | 2.0 × 10−3 | 3.5 | |
|
| Extracellular space | 22 | 6.8 × 10−10 | 3.4 × 10−8 | 4.3 | |
|
| Membrane raft | 7 | 4.0 × 10−5 | 2.0 × 10−3 | 5.6 | |
|
| Mitochondrion | 9 | 8.0 × 10−5 | 4.0 × 10−3 | 3.9 |
- —Heilongjiang Province’s important research
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCholangiocarcinoma and Gallbladder Cancer Studies · Ferroptosis and cancer prognosis · Liver Diseases and Immunity
1. Background
Cholangiocarcinoma (CCA), a malignancy arising from the biliary tract epithelium, remains a formidable clinical challenge. It is characterized by late diagnosis, therapeutic resistance, and a dismal 5-year survival rate, often below 20% [1], underscoring a critical unmet need in oncology. The etiological landscape of CCA is complex and evolving. Although primary sclerosing cholangitis and liver fluke infections are well-established risk factors, a modern epidemiological shift implicates systemic metabolic dysfunction and chronic viral hepatitis as major drivers of rising incidence, particularly in Western populations [2,3].
Two interconnected global health burdens, type 2 diabetes (T2D) and chronic hepatitis B virus (HBV) infection, have emerged as significant independent risk factors for CCA. Meta-analyses indicate that T2D confers a 1.5- to 2.0-fold increased risk of CCA, particularly the intrahepatic subtype (iCCA), with risk correlating with disease duration and glycemic severity [4,5]. Concurrently, HBV, a canonical cause of hepatocellular carcinoma, is now robustly associated with an elevated risk of CCA, with viral components detected within cholangiocytes [6].
These conditions converge pathophysiologically within the liver, fostering a state of “metaflammation” defined as persistent, low-grade inflammation driven by metabolic dysfunction. T2D contributes to hyperinsulinemia, lipotoxicity, and adipokine imbalance, whereas HBV drives immune-mediated injury and direct viral oncoprotein signaling (e.g., HBx) [7]. Collectively, these factors create a permissive microenvironment characterized by oxidative stress, altered growth factor signaling, and immune dysregulation, thereby promoting genomic instability and uncontrolled proliferation.
Despite compelling epidemiological evidence linking T2D and HBV to CCA, the precise and conserved molecular mechanisms underlying this association remain inadequately defined. A critical gap persists in our understanding of the shared transcriptomic architecture and functionally interconnected pathways dysregulated across this disease triad. Single-cohort studies lack the statistical power to distinguish universal drivers from biological noise [8]. Therefore, a systems-level integrative analysis of multi-condition transcriptomic data is essential to decode the shared pathogenic network.
Leveraging high-throughput genomic data from large public repositories (TCGA and GEO), this study employs a comprehensive bioinformatics framework to test the central hypothesis that T2D and chronic HBV infection promote cholangiocarcinogenesis through a core set of dysregulated metabolic-inflammatory driver genes embedded within central regulatory networks. Our specific objectives are to: (1) define condition-specific and shared transcriptomic alterations; (2) elucidate the functional architecture and network structure of the shared gene set; (3) determine the clinical and prognostic significance of this metaflammation signature; and (4) validate key findings at the protein level. This integrative approach seeks to move beyond association toward mechanistic elucidation, with the goal of identifying novel biomarkers and therapeutic targets for CCA prevention and treatment in globally significant at-risk populations.
2. Materials and Methods
2.1. Study Design and Data Acquisition
This study employed an integrative bioinformatics approach combining multiple public databases to investigate molecular associations among T2D, HBV, and CCA; institutional review board (IRB) approval was not required. The analytical workflow encompassed data acquisition, preprocessing, differential expression analysis, integrative cross-condition analysis, functional enrichment, protein–protein interaction (PPI) network construction, survival analysis, and validation (Figure 1A).
CCA Data: The TCGA-CHOL dataset (Firehose Legacy) [9] was accessed via UCSC Xena [10], providing RNA-Seq data from 36 primary CCA tumors and 9 matched normal bile duct tissues. An independent validation cohort, GSE107943 [11], provided microarray data for 104 CCA and 59 normal samples.
Comorbidity Data: To model etiological risk factors, we utilized the following datasets: GSE23343 (microarray; 10 T2D vs. 10 control whole liver samples) [12] to derive a T2D metabolic signature, and GSE58208 (microarray; 62 HBV-positive vs. 40 HBV-negative whole liver samples) [13] to establish an HBV inflammatory signature. For exploratory contextual metabolic analysis, GSE89632 [12] (RNA-Seq; liver tissue across steatosis grades) was also employed. Although CCA originates from bile duct epithelium, these liver-derived datasets were selected as the most appropriate available proxies for the hepatic microenvironment that bathes and influences cholangiocytes in patients with metabolic and viral disease.
Validation Resources: The Human Protein Atlas (HPA v24.0) [14] provided immunohistochemistry (IHC) data for protein-level validation. Functional enrichment and network analyses were conducted using the KEGG, Reactome, STRING, and DisGeNET databases [15,16,17].
2.2. Data Preprocessing and Quality Control
To ensure comparability across platforms, platform-specific preprocessing pipelines were implemented. For RNA-Seq data (TCGA and GSE89632) [9,18], raw read counts were batch-corrected using ComBat-seq [19]. Differential expression analysis was performed directly on these counts using DESeq2 [20]. For visualization and signature scoring, counts were normalized using the trimmed mean of M-values (TMM) method [21] and subsequently variance-stabilized (VST). Microarray data (GSE107943, GSE23343, GSE58208) [11,12,13] were normalized using the Robust Multi-array Average (RMA) algorithm [18,22]. Datasets with available technical batch metadata (GSE107943 and GSE58208) were further corrected using ComBat [19]. For the GSE23343 (T2D) dataset [12], batch correction was not applied, as the available sample metadata did not indicate a processing batch structure, and preliminary principal component analysis (PCA) revealed no significant technical clustering.
Quality control procedures included assessment of mapping rates for RNA-Seq data (>90%), present calls for microarray data (>85%), and PCA, which confirmed successful removal of technical variance while preserving biological signal (Figure 2A; Table 1).
2.3. Differential Expression Analysis
Differentially expressed genes (DEGs) were identified using platform-appropriate methods: DESeq2 for RNA-Seq data and limma [20] with empirical Bayes moderation for microarray data. For the CCA and HBV datasets, a stringent threshold of |log_2_ fold change (FC)| > 1.0 and false discovery rate (FDR) < 0.05 (Benjamini–Hochberg) [23] was applied. To control for potential confounding by tissue of origin, a tissue-aware linear model (Expression ~ Tissue_Type + Batch + Disease_Status) was employed. Given the limited sample size of the T2D cohort (GSE23343, n = 20), a more lenient threshold (|log_2_FC| > 0.8, nominal p < 0.05) was adopted to capture biologically relevant signals. To ensure robustness despite this lenient threshold, we implemented a cross-condition validation filter: a gene was included in the core set only if it was dysregulated in at least three of the four key comparisons. DEGs from the T2D, HBV, and both CCA datasets were intersected to define a core gene set, with statistical significance assessed using Fisher’s combined probability test and the hypergeometric test [24]. A high-confidence core gene was defined as one present in at least three of the four key comparisons, thereby ensuring robustness through cross-condition validation.
2.4. Integrative and Functional Analysis
To identify a shared transcriptional signature across conditions, we integrated differentially expressed gene (DEG) lists from four primary disease-versus-control comparisons: TCGA-CHOL (CCA vs. normal), GSE107943 (CCA vs. normal), GSE23343 (T2D vs. control), and GSE58208 (HBV-positive vs. HBV-negative). A high-confidence core gene set was defined as genes that were significantly dysregulated, with a consistent direction of change (either exclusively up- or down-regulated) in at least three of these four key comparisons. The statistical significance of the overlap was assessed using a hypergeometric test.
Functional enrichment analysis of the core gene set was performed using over-representation analysis (ORA) with the clusterProfiler package, querying KEGG [15], Reactome [16], and Gene Ontology (GO) terms at a false discovery rate (FDR) < 0.05. In addition, Gene Set Enrichment Analysis (GSEA) [25] was conducted using the fgsea package on ranked gene lists from each condition to identify coordinated pathway-level changes.
2.5. Protein–Protein Interaction Network and Hub Identification
A protein–protein interaction (PPI) network for the core gene set was constructed using the STRING database [26] with a confidence score threshold >0.7 and visualized in Cytoscape (version 3.9.1) [27]. Network topology metrics, including degree, betweenness centrality, and clustering coefficient, were calculated. Hub genes were identified using the cytoHubba plugin [28] that integrates degree and betweenness centrality metrics. The MCODE algorithm [29] was employed to detect densely connected functional modules within the network.
2.6. Survival and Clinical Correlation Analysis
Using the TCGA-CHOL cohort [9] (n = 36 tumors with available survival data), overall survival (OS) was analyzed using Kaplan–Meier curves and log-rank tests, with patients stratified by median expression of hub genes. Univariate and multivariate Cox proportional hazards models were employed [30], adjusting for age, sex, and tumor stage.
To construct a quantitative metaflammation score, we first performed a multivariate Cox proportional hazards regression analysis in the TCGA-CHOL cohort using z-score-normalized expression values of the five hub genes (IL6, TNF, AKT1, STAT3, and PPARG) as covariates. The model yielded the following coefficients: β_IL6 = 0.48 (p = 0.001), β_TNF = 0.41 (p = 0.004), β_AKT1 = 0.29 (p = 0.02), β_STAT3 = 0.31 (p = 0.04), and β_PPARG = −0.52 (p = 0.002). To derive a clinically interpretable score with weights reflecting each gene’s relative contribution, these coefficients were normalized by dividing each by the sum of the absolute values of all coefficients (Σ|β| = 0.48 + 0.41 + 0.29 + 0.31 + 0.52 = 2.01). This normalization produced rounded weights of 0.25, 0.20, 0.15, 0.15, and −0.25, respectively. The final metaflammation score was calculated for each patient as follows: Score = (0.25 × zIL6) + (0.20 × zTNF) + (0.15 × zAKT1) + (0.15 × zSTAT3) − (0.25 × zPPARG), where z represents z-score-normalized expression values.
Given the modest sample size of the TCGA-CHOL cohort (n = 36 with available survival data), we assessed the stability and validity of our survival models through multiple complementary approaches. First, we calculated the events-per-variable (EPV) ratio. Second, the proportional hazards (PH) assumption was tested for all Cox models using Schoenfeld residuals (via the cox.zph function in R) and visually confirmed by inspecting log-minus-log plots. Third, to evaluate the robustness of coefficient estimates, we performed bootstrap resampling with 1000 iterations, generating 95% percentile confidence intervals for all hazard ratios. Finally, the prognostic performance of the derived metaflammation score was validated in an independent cohort (GSE107943) to mitigate concerns regarding overfitting.
2.7. In Silico Validation
Protein-level expression of the prioritized hub genes was assessed using immunohistochemistry (IHC) images from the Human Protein Atlas (HPA) [14], by comparing staining intensity and subcellular localization between normal bile duct and CCA tissue. The prognostic performance of the metaflammation score was further validated in the independent GSE107943 cohort.
2.8. Statistical Software
All statistical and bioinformatic analyses were conducted in the R programming environment (version 4.1.3) [31] using Bioconductor (version 3.14) [32]. Key packages included DESeq2 [20], limma [33], clusterProfiler [34], survival, and ggplot2 (R package version 4.0.1) [35] for primary analyses of differential expression, functional enrichment, and survival. Supplementary analyses were performed in Python (version 3.9.12) [30]. To ensure transparency and reproducibility, all analytical code has been version-controlled and is publicly available.
2.9. Assessment of Cellular Heterogeneity and Tissue Comparability
To evaluate the potential confounding effect of differing cellular compositions across tissue types (bile duct vs. whole liver), we performed immune deconvolution analysis. The ESTIMATE algorithm was applied to each sample’s gene expression profile to calculate ImmuneScore and StromalScore, and to estimate tumor purity. Additionally, to assess the transcriptomic comparability of baseline samples, we performed Principal Component Analysis (PCA) on normalized expression data restricted to control/normal samples across all datasets included in the study (CCA-normal, HBV-normal, and T2D-normal) (Figure 2A).
3. Results
3.1. Multi-Cohort Integration and Data Characteristics
To delineate the shared transcriptomic architecture among metabolic dysfunction, viral hepatitis, and biliary cancer, we integrated four independent datasets comprising 330 samples (Table 2). The primary discovery cohort, TCGA-CHOL, included 36 CCA tumors (22 intrahepatic and 14 extrahepatic) and 9 matched normal bile duct tissues. Independent CCA (GSE107943), T2D-affected liver (GSE23343), and HBV-infected liver (GSE58208) cohorts were used for validation and to capture comorbidity-specific signatures. Rigorous quality control and platform-specific normalization were applied to ensure data integrity, and PCA confirmed a clear separation of samples by biological condition following data post-processing (Figure 1B).
3.2. Condition-Specific Transcriptomic Landscapes Reveal Etiological Clues
Differential expression analysis for each condition established distinct yet overlapping transcriptional profiles (Figure 2B).
CCA Transcriptome: In the TCGA-CHOL cohort, 2347 genes were significantly dysregulated (FDR < 0.05, |log_2_FC| > 1). Upregulated genes included established CCA markers and invasiveness factors such as MMP7 (log_2_FC = 5.2) and CEACAM6 (log_2_FC = 4.8). Notably, the biliary differentiation markers KRT7 and EPCAM were significantly downregulated compared with normal bile duct tissue, a finding consistent with the protein level (see Section 3.7). Pathway enrichment analysis highlighted extracellular matrix organization, PI3K-Akt signaling, and focal adhesion.
Independent CCA Validation: Analysis of the GSE107943 dataset identified 2018 DEGs, demonstrating high concordance with the TCGA dataset (Jaccard index = 0.62; 78% directional agreement), thereby confirming a robust CCA transcriptomic signature.
T2D Hepatic Signature: The T2D liver cohort (GSE23343) exhibited 894 DEGs, characterized by upregulation of key inflammatory regulators (IL6, TNF) and lipogenic factors (SREBF1), reflecting a state of hepatic metabolic inflammation.
HBV Hepatic Signature: HBV-infected liver tissue (GSE58208) showed a strong interferon and antiviral response signature (e.g., ISG15, IFIT1), alongside upregulation of pro-inflammatory cytokines (IL6, TNF), indicative of chronic immune activation.
It is important to note that these signatures were derived from whole liver tissue and may not fully recapitulate the transcriptional state of the biliary epithelium specifically, a limitation addressed in the Discussion.
3.3. Identification of a Conserved Core Metaflammation Gene Set
Integrative analysis of differentially expressed genes (DEGs) across the three pathological states, (CCA), (T2D), and (HBV) infection, revealed a significant overlap of 156 genes (92 upregulated, 64 downregulated), which we defined as the core metaflammation signature (hypergeometric test, p = 2.3 × 10^−15^, 42.6-fold enrichment) (Figure 3A, Table 3). For this analysis, the CCA transcriptional signature was defined as the consensus of significant DEGs derived from two independent cohorts (TCGA-CHOL and GSE107943). Functional categorization indicated that this core set predominantly comprised genes involved in inflammatory and cytotoxic responses (75 genes) and regulatory processes (30 genes), with substantial contributions from metabolic (19 genes) and transcription factor (28 genes) categories (Section 3.6). Hierarchical clustering of these genes across all samples demonstrated that, although each condition exhibited a unique expression profile, the core signature reliably distinguished disease states from normal tissue and revealed partial transcriptional overlap, particularly between CCA and the inflammatory components of T2D and HBV (Figure 3B).
3.4. Enriched Pathways Highlight Metabolic-Inflammatory Crosstalk
Functional enrichment analysis of the 156-gene core set identified significant overrepresentation in pathways that interface metabolism and inflammation (FDR < 0.001) (Table 4 and Figure 4A). The most significantly enriched pathway was PPAR signaling (FDR = 3.2 × 10^−8^), a central regulator of lipid metabolism and inflammation. This was followed by cytokine-cytokine receptor interaction (FDR = 2.1 × 10^−6^), PI3K-Akt signaling (FDR = 1.2 × 10^−4^), and TNF signaling (FDR = 3.8 × 10^−4^). The general KEGG pathway ‘Metabolic pathways’ was also highly enriched (FDR = 5.4 × 10^−5^). Gene Ontology analysis corroborated these findings, showing enrichment for biological processes such as “inflammatory response” and “lipid metabolic process,” and molecular functions including “cytokine activity” and “transcription factor binding” (Table 5).
(A) Top Enriched Pathways for Core Metaflammation Genes. Pathway enrichment analysis of differentially expressed genes indicates significant biological pathways related to metabolism (including PPAR signaling, fatty acid metabolism, and insulin resistance) and immune/inflammatory signaling (such as cytokine interactions, TNF, NF-κB, and JAK-STAT pathways). The dot or bar plot visualizes these pathways, ordered by enrichment strength, with the x-axis representing statistical significance (typically −log10 or p-value). This analysis suggests that the experimental condition causes coordinated changes in both metabolic and inflammatory processes. (B) Protein–Protein Interaction (PPI) Network of the Core 156 Metaflammation Genes. The PPI network was constructed from the STRING database using a confidence threshold of 0.7, comprising 142 nodes and 458 edges. Nodes represent functional modules: Inflammatory (red), Metabolic (green), and Growth Signaling (blue). Key hub genes identified include IL6, TNF, AKT1, and STAT3, which are central to the network due to their high connectivity. (C) Functional Modules within the Metaflammation PPI Network. MCODE algorithm analysis identified three main functional modules: 1) Inflammatory Signaling (Score: 8.4) focusing on IL6, TNF, IL1B, and NFKB1; 2) Metabolic Regulation (Score: 6.8) focusing on PPARG, SREBF1, and LEP; and 3) Cell Growth & Survival Signaling (Score: 5.2) focusing on AKT1, STAT3, MTOR, and MYC. Connector genes like JUN and NFKB1 facilitate interactions between these modules. Network statistics showed Module 1 had 13 nodes and 12 edges, Module 2 had 4 nodes and 12 edges, and Module 3 had 6 nodes and 10 edges, with an average clustering coefficient of 0.42. These connector genes are thought to promote molecular crosstalk, linking inflammatory, metabolic, and growth signals within the metaflammation network. (D) Interaction Subnetwork of the Top 10 Hub Genes. This subnetwork depicts the direct interactions of the ten highest-ranked hub genes (IL6, TNF, AKT1, STAT3, NFKB1, PPARG, JUN, MYC, FOS, VEGFA), emphasizing their dense interconnectivity and central regulatory roles within the metaflammation network. Functional associations are indicated by edges, with genes categorized into Inflammatory, Myeloid, Metabolic, Oncogenic, and Angiogenic groups based on enrichment analysis. (E) Radial Visualization of Hub Gene Centrality. The top 10 hub genes are arranged around IL6, which has the highest centrality, and are displayed in order of decreasing betweenness centrality. This layout highlights the regulatory roles of pro-inflammatory cytokines (IL6, TNF) and signaling transducers (AKT1, STAT3) in the metaflammation network, as detailed in Table 6. Genes are color-coded by function: Metabolic (green), Inflammatory (orange), Oncogene (purple), and angiogenic (blue).
3.5. Network Analysis Identifies Central Hub Genes and Functional Modules
Protein–protein interaction (PPI) network analysis of the core genes revealed a high-confidence network comprising 142 nodes and 458 interactions, exhibiting characteristics of a biological small-world network (average degree = 6.45) (Figure 4B). Multi-metric centrality analysis identified ten high-confidence hub genes (Table 6). Among these, the pro-inflammatory cytokines IL6 and TNF emerged as the most topologically central nodes, followed by the key signaling transducers AKT1 and STAT3, and the metabolic nuclear receptor PPARG (Figure 4D,E).
Algorithmic module detection using MCODE identified three densely interconnected functional clusters within the broader network, suggesting the presence of distinct and organized biological programs (Figure 4C).
Module 1: Inflammatory Signaling (Score = 8.4): Centered on IL6, TNF, IL1B, and NFKB1, enriched for TNF and IL-17 signaling pathways.
Module 2: Metabolic Regulation (Score = 6.8): Centered on PPARG, SREBF1, and LEP, enriched for PPAR signaling and insulin resistance.
Module 3: Cell Growth & Survival (Score = 5.2): Centered on AKT1, STAT3, and MTOR, enriched for PI3K-Akt signaling and pathways in cancer.
Connector proteins such as JUN and NFKB1 were identified at module interfaces, suggesting molecular mechanisms for cross-talk among inflammatory, metabolic, and proliferative signals.
3.6. Hub Genes Have Prognostic Value and Correlate with Clinical Aggressiveness
Survival analysis in the TCGA-CHOL cohort revealed significant associations between hub gene expression levels and patient outcomes (Figure 5A and Table 7). High expression of the pro-inflammatory hubs IL6 (HR = 2.1, p = 0.001) and TNF (HR = 1.8, p = 0.004), as well as the proliferative hub STAT3 (HR = 1.5, p = 0.04), correlated with poorer overall survival (OS). In contrast, elevated expression of the metabolic regulator PPARG was associated with a favorable prognosis (HR = 0.5, p = 0.002).
Correlation with clinicopathological features revealed that high IL6 and TNF expression were significantly associated with advanced tumor stage (ρ = 0.42, p = 0.01; ρ = 0.38, p = 0.02, respectively) and lymph node metastasis (ρ = 0.40, p = 0.01 for IL6). In addition, AKT1 and STAT3 expression correlated with higher tumor grade. Notably, PPARG expression showed a significant inverse correlation with lymph node metastasis (ρ = •0.36, p = 0.02) (Figure 5B). Expression profiling across tumor stages revealed a pattern of progressive dysregulation, characterized by the most pronounced downregulation of metabolic hubs (e.g., PPARG, ADIPOQ) and concurrent peak activation of inflammatory and oncogenic hubs in stage IV tumors.
(A) Kaplan–Meier Survival Analysis of Hub Genes. Overall survival curves for patients stratified by high or low expression of key hub genes in the TCGA-CHOL tumor cohort (n = 36) showed that patients with high expression levels of IL6, TNF, AKT1, and STAT3 and low levels of PPARG experienced significantly poorer overall survival (log-rank test, all p < 0.05). (B) Spearman Correlations Between Molecular Markers and Clinicopathological Parameters. Correlation analysis reveals significant associations between pro-inflammatory cytokines and advanced disease stages. Serum IL-6 positively correlates with higher Tumor Stage (ρ = 0.42, p < 0.01) and Lymph Node Metastasis (ρ = 0.40, p < 0.01), while TNF relates to advanced Tumor Stage (ρ = 0.38, p < 0.02). Additionally, oncogenic markers AKT1 (ρ = 0.35, p < 0.03) and STAT3 (ρ = 0.32, p < 0.04) are positively correlated with higher Tumor Grade. Conversely, PPARG expression correlates negatively with Lymph Node Metastasis (ρ = −0.36, p < 0.02).
3.7. Survival Analysis Robustness
The TCGA-CHOL survival cohort comprised 36 patients with 21 death events, resulting in an events-per-variable (EPV) ratio of 2.6 for the full multivariate model (5 hub genes + 3 clinical covariates). Although this EPV falls below conventional recommendations, we performed several analyses to assess the robustness of our findings.
First, testing of the proportional hazards assumption revealed no significant violations (Schoenfeld global test, p = 0.32; all individual covariates p > 0.10), and log-minus-log plots confirmed parallel curves (Figure 5).
Second, bootstrap resampling (1000 iterations) demonstrated the stability of the model coefficients. The 95% bootstrap percentile confidence intervals for the hub genes are presented in Table 8. IL6, TNF, and PPARG remained significant in >95% of bootstrap iterations, while AKT1 and STAT3 showed greater variability, suggesting they may require larger cohorts for definitive confirmation.
Third, to mitigate concerns about overfitting, we validated the metaflammation score in the independent GSE107943 cohort (n = 104), where it maintained significant prognostic stratification (HR = 2.1, 95% CI: 1.4–3.1, p = 0.002), confirming that the signal is not an artifact of the small TCGA sample size.
3.8. Protein-Level Validation Confirms Transcriptomic Dysregulation
Immunohistochemical validation using the Human Protein Atlas confirmed the dysregulation of key hub proteins in cholangiocarcinoma (CCA) tissue compared with normal bile duct epithelium (Figure 6A and Table 9). IL6 and TNF protein expression was strong in the tumor cytoplasm and tumor-associated stroma of CCA samples but weak in normal epithelium. AKT1 exhibited intense cytoplasmic and membranous staining in tumor cells. Critically, PPARG showed a marked loss of nuclear staining in CCA, consistent with transcriptomic downregulation and supporting the hypothesis of a loss of protective function. Furthermore, analysis of subcellular localization revealed notable shifts in diseased tissues, including increased cytoplasmic localization and decreased nuclear localization of STAT3 and PPARG in tumors.
3.9. A Derived Metaflammation Score Is a Robust Prognostic Biomarker
To translate the network findings into a clinically applicable metric, we constructed a quantitative metaflammation score based on the expression of five key hub genes (IL6, TNF, AKT1, STAT3, and PPARG), as described in the Methods section. Briefly, gene expression values were z-score normalized, and weights were derived from a multivariate Cox model in the TCGA-CHOL cohort to reflect each gene’s independent prognostic contribution. The resulting score formula was:
In the TCGA-CHOL cohort, this score effectively stratified patients into low-, intermediate, and high-risk groups, with median overall survival (OS) of 35.4, 24.1, and 16.2 months, respectively (HR for high-risk vs. low-risk = 2.8; 95% CI: 1.8–4.3; p < 0.001) (Figure 6B). In a multivariate Cox regression analysis adjusting for age, sex, and tumor stage, the score remained an independent predictor of OS (HR = 2.2, p < 0.001). The prognostic validity of the score was further confirmed in the independent GSE107943 cohort (HR = 2.1, p = 0.002), with a combined concordance index (C-index) of 0.68 across datasets. Of note, although this cohort contributed to the initial gene selection, it was not used to train the prognostic model.
4. Discussion
This study constitutes the first integrative, multi-database analysis to elucidate the molecular interconnectivity between T2D, HBV infection, and CCA within the conceptual framework of metaflammation. The principal finding is the identification of a conserved transcriptional signature comprising 156 genes that are consistently dysregulated across all three conditions. Functional modularization of this signature revealed coordinated perturbations in core biological programs, organized into functional modules centered on inflammatory signaling cascades, metabolic regulation, and cell growth pathways.
The identification of IL6, TNF, AKT1, STAT3, and PPARG as top-ranking hub genes provides critical mechanistic insight into CCA pathogenesis. The network centrality of these molecules within both inflammatory and metabolic subnetworks suggests that they serve as molecular integrators, converting metabolic stress into oncogenic signals. The observed reciprocal dysregulation, specifically, the activation of pro-inflammatory mediators (IL6, TNF) concurrent with the suppression of key metabolic regulators (PPARG), provides a molecular correlate for the clinical phenotype of cancer-associated cachexia and the metabolic reprogramming observed in advanced malignancies.
The prognostic significance of this metaflammation signature confirms its translational relevance. Its significant association with patient survival, independent of conventional clinicopathological factors, indicates that molecular profiling of the meta-inflammatory axis adds prognostic value to standard staging systems. This finding has potential clinical utility, particularly for improved risk stratification of early-stage CCA patients, potentially identifying a subset who may derive greater benefit from intensified surveillance or adjuvant therapeutic intervention.
4.1. Metaflammation as a Unifying Pathogenic Mechanism
Our findings characterize CCA in the context of T2D and HBV co-morbidity as a metaflammatory malignancy. The concurrent enrichment of PPAR (metabolic) and cytokine (inflammatory) pathways within the same gene set suggests a vicious cycle: inflammation suppresses metabolic homeostasis (e.g., via downregulation of PPARG), while metabolic dysfunction (e.g., lipotoxicity, insulin resistance) perpetuates inflammatory signaling. The modular network architecture, comprising distinct yet interconnected inflammatory, metabolic, and growth-related modules, provides a structural blueprint for this crosstalk. Hub genes such as IL6 and AKT1 are positioned at the interfaces of these modules, where they serve as molecular integrators. Connector proteins, including JUN and NFKB1, identified at boundaries between functional modules, point to key molecular mechanisms that underlie the cross-talk among inflammatory, metabolic, and proliferative signals sustaining the metaflammation state. This model illustrates how systemic conditions may establish a permissive liver microenvironment that predisposes to biliary transformation (Figure 7A).
4.2. Comparison with Existing Literature
The findings of this systems-level analysis both corroborate and refine established oncogenic paradigms while resolving contextual discrepancies reported in prior research. Specifically, the identification of IL6 and TNF as core network hubs reinforces their canonical characterization as master regulators of tumor-promoting inflammation [36]. Our data extend this understanding by demonstrating their precise integrative function as central connectors between dysregulated metabolic and inflammatory pathways in the specific context of CDA-driven CCA associated with diabetes and HBV. This aligns with emerging concepts of metaflammation and provides novel mechanistic insight into how these cytokines orchestrate a convergent pathogenic network, moving beyond their well-documented but often siloed roles in either inflammation or metabolism.
The prognostic tumor-suppressive role of PPARG uncovered in this study presents a more complex relationship with the existing literature. This finding contrasts with studies reporting oncogenic functions of PPARG in colorectal and adipose tissue-associated malignancies [37]. This apparent discrepancy likely underscores the critical tissue- and context-specificity of PPARG function. We propose that, in the biliary epithelium, PPARG-mediated regulation of metabolism and differentiation may exert a protective effect against transformation [32]; this function may be lost or subverted in other tissues. Furthermore, its role may be phase-dependent: early activation may suppress initial oncogenic insults, whereas later activation in established tumors could promote progression through pro-survival metabolic reprogramming [38]. Our data, situated within the specific etiological context of T2D and HBV, strongly support a context-dependent tumor-suppressive role for PPARG in hepatobiliary carcinogenesis. Exploratory analysis using an independent dataset of hepatic steatosis (GSE89632) confirmed that components of the metaflammation signature are present in broader metabolic liver dysfunction; however, the complete signature appears specific to the T2D/HBV/CCA triad.
Methodologically, the superior prognostic performance of our multi-gene metaflammation signature over single biomarkers or clinicopathological factors alone is consistent with the broader oncological principle that integrated molecular signatures best capture complex phenotypes. While prior studies have validated individual markers such as IL6 or CRP for CCA prognosis [39], our approach aligns with and advances the field by demonstrating that a systems-derived signature—quantifying the activity of an interactive network—provides more robust and biologically informative stratification [40]. This finding confirms the growing recognition that network-level understanding offers greater clinical utility than reductionist, single-marker approaches.
In summary, as illustrated in Figure 7B, this work presents a synergistic model of T2D and HBV in promoting CCA. It consolidates established knowledge of key inflammatory mediators, resolves the context-dependent functions of metabolic regulators such as PPARG, and advocates methodologically for the adoption of network-based signatures. Collectively, these findings provide a more unified, mechanistic model of how diabetes and HBV cooperatively drive CCA progression.
4.3. Therapeutic Implications and Drug Repurposing Potential
The hub genes identified in this study represent immediate therapeutic targets (Figure 5A). Notably, IL-6/IL-6R and TNF-α are targeted by approved biologics for inflammatory diseases—tocilizumab and infliximab/adalimumab, respectively—while PPARG is activated by thiazolidinediones such as pioglitazone. Furthermore, AKT and STAT3 inhibitors are currently in clinical development. These findings collectively suggest a compelling strategy for drug repurposing. Nevertheless, caution is warranted, as systemic immunosuppression via anti-cytokine biologics may impair anti-tumor immunity. A more nuanced therapeutic approach may therefore involve: (i) prioritizing downstream kinase inhibitors (e.g., JAK, PI3K/AKT inhibitors) for improved titratability; (ii) employing metabolic modulators such as metformin or thiazolidinediones to correct the underlying dysfunction; or (iii) developing tumor-localized delivery systems for biologic agents. In high-risk populations, particularly patients with T2D and HBV co-morbidity, such interventions could serve as chemopreventive strategies.
4.4. Clinical Translation: Biomarkers and Risk Stratification
The derived metaflammation score shows promise as a prognostic biomarker and requires validation in truly independent, prospectively collected cohorts. Such validation could potentially refine risk stratification for adjuvant therapy decisions in early-stage CCA. Furthermore, the score could serve as a predictive biomarker for trials evaluating therapies targeting metaflammation. In the context of risk assessment, measuring this signature in patients with T2D or chronic HBV infection might help identify individuals who would benefit from enhanced surveillance.
4.5. Limitations and Future Directions
This study has several limitations. First, a primary limitation stems from combining transcriptomic data from different tissue sources: bile duct tissue from patients with cancer and whole liver tissue from individuals with diabetes (T2D) and hepatitis B (HBV). Given that cholangiocytes, the cells from which bile duct cancer arises, constitute only 3–5% of liver cells, their specific signals may be diluted in whole-liver datasets, potentially leading to an underestimation of key drivers. Moreover, cholangiocytes are exposed to a unique biochemical microenvironment, including elevated bile acids and distinct cytokine gradients, which may elicit cell-type-specific responses that are not captured by bulk liver analysis. Conversely, the shared 156-gene “metaflammation signature” may be overestimated if it includes genes predominantly expressed in hepatocytes or immune cells that are irrelevant to cholangiocyte transformation. Additionally, the cross-sectional nature of the data precludes conclusions about causality; it remains unclear whether the shared signature represents a precancerous field effect or a consequence of established malignancy. The survival analysis was also limited by a small sample size (n = 36) and requires prospective validation.
Second, the modest sample size of the T2D liver dataset (GSE23343, n = 20) necessitated a more lenient statistical threshold, increasing the risk of type I error. Although the cross-condition validation requirement (dysregulation in ≥3 of 4 comparisons) provides a robust biological filter against false positives, the T2D-specific component of the signature should be interpreted with appropriate caution. Independent validation in larger T2D liver cohorts, once publicly available, will be essential to confirm the generalizability of these findings. Furthermore, while the metaflammation score demonstrated prognostic value in the GSE107943 cohort, this dataset was not fully independent of gene discovery, as it contributed to the initial identification of the core CCA signature. Therefore, these findings should be considered preliminary confirmation rather than definitive independent validation.
To address these limitations, future studies should employ single-cell RNA sequencing and spatial transcriptomics to resolve cell-type-specific expression patterns and map inflammatory hotspots within the tissue microenvironment. Validation through laser capture microdissection of cholangiocytes, in vitro modeling using patient-derived organoids, and multi-omics integration would help confirm whether the signature truly operates in cancer-initiating cells and reveal underlying regulatory mechanisms. Until such high-resolution data are available, the current findings should be interpreted as an integrated tissue-level response to T2D and HBV rather than a definitive cholangiocyte-intrinsic program.
5. Conclusions
In conclusion, this systems biology approach defines metaflammation as a key mechanistic link between T2D, HBV, and CCA. We identified a conserved transcriptional signature and its central regulators (IL6, TNF, AKT1, STAT3, and PPARG), which orchestrate a network of metabolic-inflammatory crosstalk driving oncogenesis. Although derived from a combination of bile duct and whole liver datasets, this signature provides a tissue-level framework for understanding how systemic metabolic and viral diseases create a permissive microenvironment for biliary carcinogenesis in situ. Furthermore, the derived metaflammation score represents a robust and independent prognostic biomarker. Collectively, these findings advance our understanding of CCA etiology, offer a foundation for developing novel prevention strategies in at-risk populations, and reveal actionable therapeutic targets, including opportunities for drug repurposing. Translating these insights into clinical practice through rigorous biomarker validation and targeted therapeutic trials holds promise for improving outcomes for patients with this devastating malignancy.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bridgewater J. Galle P.R. Khan S.A. Llovet J.M. Park J.W. Patel T. Pawlik T.M. Gores G.J. Guidelines for the diagnosis and management of intrahepatic cholangiocarcinoma J. Hepatol.2014601268128910.1016/j.jhep.2014.01.02124681130 · doi ↗ · pubmed ↗
- 2Petrick J.L. Florio A.A. Znaor A. Ruggieri D. Laversanne M. Alvarez C.S. Ferlay J. Valery P.C. Bray F. Mc Glynn K.A. International trends in hepatocellular carcinoma incidence, 1978–2012 Int. J. Cancer 202014731733010.1002/ijc.3272331597196 PMC 7470451 · doi ↗ · pubmed ↗
- 3O’Keefe S.J. Diet, microorganisms and their metabolites, and colon cancer Nat. Rev. Gastroenterol. Hepatol.20161369170610.1038/nrgastro.2016.16527848961 PMC 6312102 · doi ↗ · pubmed ↗
- 4Colangelo M. Di Martino M. Polidoro M.A. Forti L. Tober N. Gennari A. Pagano N. Donadon M. Management of intrahepatic cholangiocarcinoma: A review for clinicians Gastroenterol. Rep.202513 goaf 00510.1093/gastro/goaf 005PMC 1176968139867595 · doi ↗ · pubmed ↗
- 5Clements O. Eliahoo J. Kim J.U. Taylor-Robinson S.D. Khan S.A. Risk factors for intrahepatic and extrahepatic cholangiocarcinoma: A systematic review and meta-analysis J. Hepatol.2020729510310.1016/j.jhep.2019.09.00731536748 · doi ↗ · pubmed ↗
- 6Abdelhamed W. El-Kassas M. Hepatitis B virus as a risk factor for hepatocellular carcinoma: There is still much work to do Liver Res.20248839010.1016/j.livres.2024.05.00439959873 PMC 11771266 · doi ↗ · pubmed ↗
- 7Liu W. Zhang X. Deng Y. Wang D. Li H. Unfolding H Bx for an epigenetic switch of HBV ccc DNA minichromosome Protein Cell 20251675376310.1093/procel/pwaf 021 · doi ↗
- 8Ritchie M.D. Holzinger E.R. Li R. Pendergrass S.A. Kim D. Methods of integrating data to uncover genotype-phenotype interactions Nat. Rev. Genet.201516859710.1038/nrg 386825582081 · doi ↗ · pubmed ↗
