Self-supervised learning to predict intrahepatic cholangiocarcinoma transcriptomic classes on routine histology
Aurélie Beaufrère, Tristan Lazard, Rémy Nicolle, Gwladys Lubuela, Jérémy Augustin, Miguel Albuquerque, Baptiste Pichon, Camille Pignolet, Victoria Priori, Nathalie Théou-Anton, Mickael Lesurtel, Mohamed Bouattour, Kévin Mondet, Jérôme Cros, Julien Calderaro, Thomas Walter

TL;DR
This study develops a self-supervised learning model to predict transcriptomic classes of intrahepatic cholangiocarcinoma from routine histology slides, offering a practical alternative to costly molecular analyses.
Contribution
A self-supervised learning model is introduced to predict iCCA transcriptomic classes from histology slides without manual annotations or large tissue samples.
Findings
The model achieved AUCs of 0.63–0.84 for predicting the four most frequent transcriptomic classes in the discovery set.
Validation on TCGA and French external sets showed AUCs of 0.76–0.80 and 0.62–0.92, respectively.
The model performed particularly well for the hepatic stem-like class with an AUC of 0.84.
Abstract
The transcriptomic classification of intrahepatic cholangiocarcinoma (iCCA) has recently been refined from two to five classes, each associated with pathological features, targetable genetic alterations, and survival outcomes. Despite its potential prognostic and therapeutic value, the transcriptomic classification is not routinely used in practice because of technical limitations, including insufficient tissue material and the high cost of molecular analyses. Here, we assessed a self-supervised learning (SSL) model for predicting iCCA transcriptomic classes on digitised whole-slide images (WSIs) Transcriptomic classes defined from RNA sequencing data were available for all samples. The SSL method (Giga-SSL) was used to train our model on a discovery set of 766 WSIs from 137 biopsies and 109 surgical specimens obtained from 246 patients, using a five-fold cross-validation scheme. The…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCholangiocarcinoma and Gallbladder Cancer Studies · Gene expression and cancer classification · Gallbladder and Bile Duct Disorders
Introduction
Intrahepatic cholangiocarcinoma (iCCA), the second most common primary liver cancers, is characterised by an increasing incidence worldwide and a poor prognosis.1^,^2 Although surgery is the only curative treatment for iCCA, only 20-40% of patients can benefit from surgery because of diagnosis at an advanced stage.3^,^4 For disease not amenable to surgery, systemic treatment – typically gemcitabine and cisplatin ± durvalumab or pembrolizumab in the first-line setting – is recommended, with a median overall survival of 12 months.[5], [6], [7], [8]
Recent advances in the pathobiological and molecular understanding of iCCA have provided prognostic and theranostic factors. Transcriptomic analysis has initially identified two distinct groups of iCCA including an inflammatory class (40% of cases) characterised by activation of inflammatory signalling pathways, and a proliferation class (60% of cases) characterised by the activation of oncogenic signalling pathways.9 Recently, this classification has been refined into five classes. The inflammatory class has been subdivided into two subclasses (inflammatory stroma and immune classical) and the proliferative class into three subclasses (hepatic stem-like, tumour classical and desert-like). Importantly, this classification has been associated with tumour microenvironment characteristics, genetic alterations and prognosis, that may guide the treatment strategy.10 In particular, the most frequent transcriptomic class, the hepatic stem-like class, has been associated with a better prognosis and targetable genetic alterations including IDH1 (isocitrate dehydrogenase 1) mutations (observed in 16% of hepatic stem-like cases compared to 7% in other classes), and FGFR2 (fibroblast growth factor receptor 2) fusions (observed in 13% of hepatic stem-like cases compared to 5% in other classes). To note, specific inhibitors (ivosidenib and pemigatinib, respectively) have been approved by the US Food and Drug Administration as second-line treatments for locally advanced or metastatic iCCA.[11], [12], [13] Moreover, the two inflammatory subclasses may particularly benefit from immunotherapy. Despite its potential relevance, this comprehensive transcriptomic classification is not used in routine practice, mainly because informative data are based on expensive molecular techniques requiring adequate tissue samples rich in tumour cells.
Artificial intelligence (AI) models, particularly deep neural networks, are rapidly emerging in imaging.[14], [15], [16], [17] With the development of digital pathology and wide access to digitised whole slide images (WSIs), AI approaches can be used for various classification tasks, such as diagnostic tasks.18 For instance, AI approaches have demonstrated strong performance in identifying prognostic microscopic features and predicting transcriptomic classifications in hepatocellular carcinoma.[19], [20], [21], [22], [23] However, such deep learning techniques require large datasets (>1,000 slides)24 and heavy computational machinery that limits in-depth studies on stratified datasets. We recently introduced Giga-SSL, a self-supervised learning (SSL) algorithm designed to generate generalist low-dimensional feature vectors of WSIs, which offers both computational efficiency and label-efficiency.25
The main objective of the present study was to predict iCCA transcriptomic classes on WSIs using the Giga-SSL model, with a focus on identifying the most frequent class, namely the hepatic stem-like class.
Patients and methods
Patient samples
The workflow of the study is summarised in Fig. 1. For the discovery set, we selected 246 formalin-fixed paraffin-embedded (FFPE) iCCA cases (109 surgical specimens and 137 biopsies) archived during 2000 and 2021 in the pathology department of Beaujon Hospital (Clichy, France). The material represented 766 hematein eosin saffron (HES) slides, divided into five folds at the patient level for cross-validation. All available slides for the surgical cases (including preoperative biopsies when available, n = 25) were selected for the study (median WSI per case: 5 [range 1-12]). The slides were scanned at 20x magnification with an Aperio scanner (ScanScope AT Turbo).Fig. 1. Flowchart of the study.CV, cross validation; FFPE, formalin-fixed paraffin-embedded; HES, hematein eosin saffron, HES; ROI, region of interest; slide B RNA+, slide from biopsy sample corresponding to the sample used for the transcriptomic analysis; slide S RNA+, slide from surgical sample corresponding to the sample used for the transcriptomic analysis; slide S RNA-, slide from surgical sample, not corresponding to the sample used for the transcriptomic analysis; SSL, self-supervised learning; TCGA, The Cancer Genome Atlas.Fig. 1
For the external validation sets, we used 29 iCCA cases (surgical FFPE samples; 29 WSIs) from The Cancer Genome Atlas cholangiocarcinoma (TCGA-CHOL) public dataset, and 32 surgical FFPE iCCA samples (32 WSIs, each representing the single most representative slide per case) from the Pathology Department of Henri Mondor Hospital (Créteil, France) as the French external validation set.
Written consent was obtained from all patients as required by French legislation. This study was approved by the local ethics committee (IRB 00006477 CER Paris Nord no. CER-2022-168).
The clinical and biological data recorded were age, sex, risk factors for iCCA, tumour size (radiological assessment for biopsy and pathological assessment for surgical specimens), number of tumours and overall survival.
Pathology review
All histological slides were reviewed by an expert liver pathologist (AB) and the assessed tumour features are listed (Table S1; Fig. S1). The stage of fibrosis in the non-tumoural liver when available was evaluated according to the METAVIR staging system.26
RNA sequencing
RNA extraction
RNA sequencing was performed for each case on one representative FFPE tumour block selected from the surgical specimen in both the discovery set and the French external set. Slides corresponding to these blocks (used for transcriptomic analysis) were labelled as surgical slides (S RNA+). In the discovery and the TCGA set, slides from blocks not used for transcriptomic analysis were labelled (S RNA-). For biopsy cases, the FFPE block used for RNA sequencing corresponded directly to the selected slide, which was labelled (B RNA+) (Fig. 1 and S2).
Briefly, 5 μm-thick sections, with macrodissection if necessary to retain only tumour areas, were cut from FFPE blocks. Total RNAs were further isolated by using the Qiagen FFPE RNA extraction kit (RNeasy FFPE kit, Qiagen) for the discovery set and the Recover AllTM Total Nucleic Acid Isolation Kit for the French external validation cohort (Invitrogen, Thermo Fisher Scientific).
Gene expression analysis
Gene expression was analysed using the SMARTer Stranded Total RNA-Seq Kit for the discovery set and QuantSeq 3’ mRNA-seq Kit for the French external validation set. Only genes quantified in at least 50% of samples were retained for the analysis. RNA alignment was performed using a dedicated pipeline (https://github.com/GeNeHetX/RNApipeline). Reads are aligned with the reference genome ensemblv107_GRCh38.75 by STAR27 and gene expression counts obtained with FeatureCount.28 Gene expression profiles were quantile-normalized. The mean expression of each gene-set-defined gene signature was computed following a gene-wise centring in each dataset (without variance scaling). The transcriptomic class with the highest gene set-averaged expression was assigned to each sample. The same process was used with the TCGA dataset.
Slide pre-processing and tessellation
Slides from the discovery set were stained with HES and encoded in svs format. Slides from the external French validation set were stained with HES and encoded in ndpi format. Slides from the TCGA validation set were stained with H&E and encoded in svs format. Tissue regions automatically extracted using Otsu thresholding were then exhaustively split into 2,899,811 patches of 224 × 224 pixels (without overlapping) at 10x using the OpenSlide library in Python.
We present the results in the discovery set according to three pre-processing protocols with or without extraction of the region of interest (ROI), each requiring varying levels of expert pathologist involvement (Fig. 1 and S3):
- 1.No filter: all tiles including tumour and non-tumour are processed as they are (encompassing both tumour and non-tumour regions).
- 2.Manual filter: an expert pathologist (AB) extensively annotates tumour regions using ImageScope software, from which patches are extracted.
- 3.Learning filter: tiles are filtered using logistic regression trained on a dataset of 3,000 tile embeddings, randomly extracted and labelled by an expert pathologist (AB).
Machine learning algorithms
Data split
Training involved using a 5-fold cross-validation framework in the discovery set. Splits were stratified according to the output variable, at the patient level. They were shared among all trainings to ensure the fairness of comparison.
Giga-SSL representations
The Giga-SSL model was trained on a single V100 GPU on the TCGA-FFPE dataset following the training framework provided in our previous study25 with the exception of the following:
- •Training was performed for 100 h, or 7,800 epochs
- •WSI embeddings were ensembled over 100 views, then L2-normalized.
Finally, we used L2-regularised logistic regressions (C=7, max_iter = 10,000, and class_weight set as “balanced”) as end classification models.
MIL baseline algorithms
Beside the Giga-SSL-based classifications, we provided some baseline classification algorithms for comparison. They were based on the deep attention multiple instance learning (MIL) algorithm introduced in the work of Ilse et al.29 and slightly modified by Lazard et al..30 MIL models were trained from scratch and operate on tile embeddings extracted from the last layer of pretrained ResNet18.31 We used ResNet18 pre-trained on imagenet and on the TCGA with MoCo (i.e. the tile-encoder used in the Giga-SSL model).
External dataset inference
The probabilities predicted by the five logistic regression models trained on the training set (each corresponding to a training fold) were averaged, and performance metrics were computed using these pooled probabilities.
Statistical analysis
Continuous variables were compared using Student's t test, and categorical variables were compared using the chi-squared or Fisher’s exact tests. Survival curves were estimated by the Kaplan-Meier method and compared with log-rank statistics. p <0.05 was considered statistically significant (SPSS software). The performance of AI models was assessed using the AUC score, balanced accuracy score and F1 score (macro-average).
Results
Patient characteristics
The main clinical and pathological features of patients and tumours for each dataset are presented in Table 1. At the pathological level, the French external validation set had higher proportions of large duct type and marked immune tumour infiltration than the two other sets (22% vs. 7% and 3%, p = 0.018; 59% vs. 38% and 33%, p = 0.011).Table 1. Clinical and pathological features of the different datasets of the study.Table 1. Discovery set, N = 246French external validation set, n = 32TCGA validation set, n = 29p valuesClinical featuresAge, years (mean)63 [27-88]64 [25-85]63 [29-82]0.810Sex (male/female)141 (57)/105 (43)23 (72)/9 (28)13 (45)/16 (55)0.099Risk factorsHBV28 (11)3(9)NA0.734HCV16 (7)0 (0)NA0.231MS72 (30)5 (16)NA0.141Chronic alcohol intake43 (18)6 (19)NA0.859PSC4 (2)3 (9)NA0.035Other11 (4)1 (3)NA1.000No risk factor72 (29)17 (53)NA0.003****Pathological featuresCirrhosis (F4 METAVIR)24 (10)4 (12)NA0.544Multinodularity67 (27)10 (31)NA0.633Size (mean, cm)7 [1-22]7 [1-16]NA0.604Small duct type214 (87)22 (69)27 (93)0.010Large duct type18 (7)7 (22)1 (3)0.018Well differentiated tumour82 (33)15 (47)12 (41)0.253Moderately differentiated tumour135 (55)14 (44)14 (48)0.426Poorly differentiated tumour29 (12)3 (9)3 (10)1.000Fibrosis (no or mild/moderate or intense)49 (20)/197 (80)4 (12)/28 (88)6 (21)/23 (79)0.723Immune infiltration (no or low/moderate or high)166 (67)/80 (33)13 (41)/19 (59)18 (62)/11 (38)0.011TLS8 (3)5 (16)8 (28)<0.001Necrosis (median, %)18 [0-90]0 [0-60]0 [0-30]<0.001Data are n (%) unless otherwise indicated.NA, data not available.MS, metabolic syndrome; PSC, primary sclerosing cholangitis; TLS, tertiary lymphoid structure; TCGA, The Cancer Genome Atlas.If not available in the TCGA set, the statistical analyses were performed only between the discovery and the French external validation sets.Continuous variables were compared using the Student's t test, and categorical variables were compared using chi-squared or Fisher’s exact tests.
Transcriptomic classes
The proportion of each transcriptomic class in the three datasets is represented in Fig. 2 and Table S2. The most frequent transcriptomic class was the hepatic stem-like class (37% [90/246] in the discovery set, 44% [14/32] in the French external validation set, and 59% [17/29] in the TCGA validation set). The desert-like class was very rare (4% [11/246] in the discovery set, 3% [1/32] in the French external set, and 7% [2/29] in the TCGA validation set). The distribution closely mirrored that reported in the Mount Sinai cohort by Martin-Serrano et al.10 (Table S3). Of note, the distribution of the five transcriptomic classes differed between surgical and biopsy samples in the discovery set (Table S4). The hepatic stem-like class was more frequent in surgical than biopsy samples (49% vs. 27%, *p <*0.001), whereas the immune classical class was more frequent in biopsy samples (31% vs. 14%, *p <*0.002).Fig. 2. Distribution and characterisation according to histological features and overall survival of the five transcriptomic classes.(A) Distribution of each transcriptomic class according to the different datasets used in the study and representative histological images of each transcriptomic class (HES), (B) semi-quantitative assessment of the amount of tumour fibrosis in each transcriptomic class (∗*p <*0.05), (C) Semi-quantitative assessment of the amount of tumour immune infiltration in each transcriptomic class (∗*p <*0.05), Kaplan-Meier curves for overall survival with risk tables according to (D) hepatic stem-like, (E) tumour classical and (F) inflammatory stroma transcriptomic class. Continuous variables were compared using Student's t test, and categorical variables were compared using chi-squared or Fisher’s exact tests. Survival curves were estimated by the Kaplan-Meier method and compared with log-rank statistics.Fig. 2
As expected, transcriptomic classes in all cohorts were associated with pathological features (Fig. 2B,C). Prominent tumour fibrosis was mainly observed in the inflammatory stroma class (38/66, 58%). High tumour immune infiltration was most prevalent in the inflammatory stroma and immune classical classes (19/66 [29%] and 8/69 [16%], respectively). Hepatic stem-like and desert-like classes exhibited low tumour immune infiltration (67/120 [56%] and 7/15 [47%]). The five transcriptomic classes did not differ by tumour histological type (small vs. large duct, p = 0.074) (Fig. S4).
Clinically, overall survival (OS) was significantly associated with three transcriptomic classes in all cohorts. OS was improved in the hepatic stem-like class compared to the other transcriptomic classes (OS median: 49 vs. 35 months, hazard ratio [HR] 0.58; 95% CI 0.44-0.75; *p <*0.001) but was lower in the tumour classical and inflammatory stroma classes compared to the other classes (OS median: 21 vs. 43 months, HR 1.76; 95% CI 1.09-2.82; p = 0.003 and 31 vs. 43 months, HR 1.50; 95% CI 1.04-2.17; p = 0.011, respectively) (Fig. 2D-F).
Using self-supervised WSI representations for transcriptomic class prediction
We initially focused on the binary classification of the most frequent class (hepatic stem-like) before expanding our analysis to the other transcriptomic classes, excluding the desert-like class, which was too rare across the three datasets to develop a reliable predictive model.
Predicting the hepatic stem-like class
Discovery set
Table 2 shows cases of the cross-validated performances of both the Giga-SSL and MIL models in the discovery cohort for the hepatic stem-like binary classification task. The performance of the Giga-SSL model peaked when combined with manual tumour annotation, which resulted in a mean AUC of 0.84. Performance improved when a protocol of ROI extraction was applied to WSIs, regardless of whether the extraction of the tumour tiles was manual or learned. This improvement was particularly notable when using the classic MIL models, with the absence of WSI filtering leading to an 8-point decrease in AUC. For the Giga-SSL models, the absence of WSI filtering led to a 4-point decrease in AUC.Table 2. Cross-validated performance of the Giga-SSL and MIL models on the discovery cohort for the hepatic stem-like binary classification task according to three different pre-processing protocols.Table 2. ModelTile filterAUC scoreBalanced accuracy scoreF1 scoreGiga-SSL∅0.80.720.72M0.840.760.76A0.820.740.75MoCo + MIL∅0.740.670.67M0.820.750.75A0.820.750.74A, learning filter; M, manual filter; ∅, no filter; MIL, multiple instance learning; SSL, self-supervised learning. The best results for each method are presented in bold.
External validation
Table S5 presents the results of the external validation of models trained on all slides of the discovery cohort, with a manual filter applied to the WSIs. Logistic regression trained on the Giga-SSL embeddings of the discovery cohort demonstrated strong transferability to both the French external set (AUC = 0.86) and the TCGA set (AUC = 0.76). Notably, the TCGA cohort slides were stained with H&E, which differs from the staining protocol used for the discovery cohort's slides (HES) and further emphasizes the generalizability of the models.
Influence of the pre-processing ROI extraction protocol
We detailed in Table S5 the external validation results when models were trained according to the three different pre-processing protocols with or without ROI extraction. As observed in the cross-validated experiments on the discovery set, using an ROI extraction method (extraction only of the tumour tiles) was advantageous for both external datasets vs. extraction of both tumour and non-tumour tiles.
Effect of slide composition of the training set
In Fig. 1 and S2, the discovery dataset is illustrated to contain various types of slides, directly (RNA+) or indirectly associated with transcriptomic analysis (RNA-), including both surgical slides (S) and biopsy (B) slides. We aimed to determine how the composition of the training set affected the generalisation performance of classification models. For this, we trained models using training sets with different compositions of S RNA+, S RNA- and B RNA+ slides and monitored the prediction performance on the two external validation sets. The results are provided in Fig. 3. For the French external validation set, the highest performance was achieved when the training set was limited to slides that were directly associated with the transcriptomic analysis (S RNA+ and B RNA+).Fig. 3. Effect of the composition of the training set on the performance of the Giga-SSL model for the hepatic stem-like binary classification task.ROI, region of interest; slide B RNA+, slide from biopsy sample corresponding to the sample used for the transcriptomic analysis; slide S RNA+, slide from surgical sample corresponding to the sample used for the transcriptomic analysis; slide S RNA-, slide from surgical sample, not corresponding to the sample used for the transcriptomic analysis; SSL, self-supervised learning.Fig. 3
Next, we turned to the validation on the TCGA dataset, exclusively consisting of S RNA-slides (i.e. slides for which the transcriptome was analysed on a different tissue block). For this dataset with a putatively noisier ground truth, the model’s performance seemed closely related to the size of the training data rather than its composition (Fig. 3).
Prediction of the other transcriptomic classes
We conducted analogous experiments for predicting other transcriptomic classes including inflammatory stroma, immune classical and tumour classical. As with the hepatic stem-like class, we trained binary classifiers for each class using a patient-level five-fold cross-validation setting and then applied them to the validation sets. We manually set the ROI method and trained the model using the complete dataset (S RNA+, B RNA+, S RNA-). Table 3 presents the results of these experiments. The three classes could be predicted in a cross-validation setting and demonstrated generalisation capabilities on the validation sets.Table 3. Prediction for the four most frequent transcriptomic classes according AUC scores.Table 3. Transcriptomic classesCV results in the discovery setFrench external set (n = 32)TCGA set (n = 29)Hepatic stem-like0.84 ± 0.060.86 (n = 14)0.76 (n = 17)Tumour classical0.77 ± 0.090.88 (n = 3)n.a. (n = 0)Inflammatory stroma0.72 ± 0.100.92 (n = 7)0.80 (n = 5)Immune classical0.63 ± 0.080.62 (n = 7)0.78 (n = 5)CV, cross-validation; n.a., not applicable; TCGA, The Cancer Genome Atlas.
The classification task for the inflammatory stroma class outperforms the others on the validation set, achieving an AUC of 0.92 for the French set and 0.80 for the TCGA set. Despite the absence of tumour classical slides in the TCGA dataset, this class still seemed predictable and showed good generalisation with an AUC of 0.88 on the French set. However, despite some discernible signal for the immune classical class, it was the most challenging to classify. Confusion matrix of the misclassified cases in the whole validation set showed that among the slides wrongly classified as positive, most belong to the hepatic stem-like, immune classical, or inflammatory stroma classes (Table S7).
Discussion
We demonstrate in this study that prediction of iCCA transcriptomic classes, especially hepatic stem-like and inflammatory stroma classes, is achievable and effective using an SSL method applied to routine WSIs. This is particularly relevant as transcriptomic classes are associated with prognosis and expected to impact on the treatment response, particularly regarding immunotherapy and targeted therapies.9^,^10 Furthermore, because the hepatic stem-like group is associated with targetable molecular alterations,10 our model could be used as a pre-screening tool, since even though recent guidelines recommend their investigation, molecular profiling cannot always be carried out in routine practice, mainly for logistical and technical reasons.6 We also confirmed in our cohorts the association between the two inflammatory classes and tumour microenvironment composition (inflammation and/or fibrosis) assessed by a pathologist on HES slides, which may benefit from immune checkpoint inhibitor treatments.10 Consistent with these findings, we recently investigated the tumour microenvironment composition in a cohort of iCCA surgical specimens and showed that the inflammatory stroma class was associated with a CD8-rich infiltrate, whereas the immune classical class correlated with a low proportion of FAP+ fibroblasts.32
Currently, the leading methods for WSI classification rely on MIL.29^,^33 However, annotated datasets are often small, typically a few hundred to a few thousand WSIs, which may result in overfitting and underperforming models, whereas large unannotated datasets of tens of thousands WSI are available. Here, we used a slide level SSL model, called Giga-SSL, allowing us to leverage the large number of WSIs without annotations to infer powerful slide representations.25 Our model surpassed the performance of the standard MIL model in the binary classification task for the hepatic stem-like class. We observed a gain of 2 to 6 points in AUC according to the pre-processing ROI extraction method. This model significantly improves efficiency by increasing speed and reducing the use of computational resources. Creating WSI embeddings from raw SVS or TIFF images takes about 13 s (using the code from https://github.com/trislaz/Democratizing_WSI). After generating the WSI embeddings, all analyses were done with logistic regression and ran smoothly on a standard laptop CPU, showing very low computational demands — the full cross-validation and inference process takes only a few minutes, a major improvement compared to transcriptomic analyses, which can take several weeks.
Our model was trained and validated on three different sets to capture potential heterogeneity in tumour sampling (biopsy and surgical samples, whether or not directly associated with transcriptomic analysis) and staining methods (HE or HES). Indeed, despite being exclusively trained on HES slides, our models could predict the transcriptomic classes on either HES or H&E slides. This is particularly interesting because the routine staining differs among countries and laboratories, leading to colour heterogeneity that could affect the performance of the AI model.34^,^35 In addition, and as expected, our model demonstrated better prediction when applied to tumour tiles rather than to all tiles (including non-tumour tiles), which suggests that the essential information regarding transcriptomic subclasses is contained in the tumour itself. To bypass the time-intensive process of manual annotations by a pathologist (which is a main limitation of using AI models), our data support the use of an automatic learning filter given its close performance to that when using a manual filter, which represents a favourable balance between time invested and classification performance.
Furthermore, we provide a comparative analysis that explores the influence of training set composition according to the type of slides (from biopsy or surgical samples, directly or indirectly associated with transcriptomic analysis) on prediction accuracy. Our findings suggest that intratumoural heterogeneity can negatively affect training when non-consecutive slides are used for either molecular profiling or histological analysis. This discrepancy introduces label noise, in that the molecular class we aim to predict may not align with the tissue captured in the image. Although large datasets are generally required for effective neural network training, our results suggest that smaller datasets with high-confidence labels can outperform larger, noise-prone datasets.
Moreover, Kather et al.36 found that flash-frozen slides yielded better performance in molecular prediction tasks within the TCGA dataset, despite their poorer morphological quality compared to FFPE slides. This anomaly could also be attributed to label noise arising from tumour heterogeneity, because the molecular labels in TCGA are extracted from flash-frozen samples. These insights underscore the importance of using consecutive slides for molecular class prediction and could potentially impact the design of future studies. Finally, though more research is needed to validate the clinical use of such predictive models, patient stratification would benefit from combining the prediction of several WSIs sampled in different blocks, which would help capture the main iCCA class of the tumour and help in exploring intratumoural heterogeneity. Indeed, intratumoural molecular heterogeneity in liver cancers, including iCCA, is a key feature that may explain treatment failure and patient prognosis.37
Importantly, we included both biopsy and surgical samples in the discovery set to better reflect clinical practice. Even though using biopsies might have reduced our model's performance during cross-validation, it improved performance on the French external validation set. Hence, biopsies may provide complementary information to surgical specimen WSIs (see Fig. 3), and increase the robustness of the trained network. Currently, most AI studies of primary liver cancers have focused on surgical samples,16^,^[19], [20], [21] but most patients with iCCA will not undergo surgery, which may introduce a selection bias. Of note, we found transcriptomic classes differentially represented between surgical and biopsy cases, which highlights the importance of working with both tissue samples. A few studies, mainly focusing on diagnostic tasks, have laid the groundwork for using biopsies and have demonstrated that encouraging deep learning-based results can be obtained in this type of sample despite their size.18^,^38^,^39 Thus, our model could serve as a useful molecular screening tool, particularly for biopsies, where molecular analysis is often limited by the small amount of tissue available.
The main limitation of our study is the unbalanced proportion of each transcriptomic class and, in particular, the rarity of the desert-like class, which represents less than 10% of cases in all sets, precluding the development of a predictive model. Learning from a larger number of cases could be beneficial, but finding complete datasets containing survival and transcriptomic data, and histological slides of FFPE iCCA samples is difficult, as evidenced by the low number of cases available in the TCGA dataset. The methodology we applied for the RNA classifier differed slightly from that described by Martin-Serrano et al.,10 owing to the unavailability of the original code. Nonetheless, the distribution of transcriptomic classes obtained in our analysis was highly consistent with that reported by Martin-Serrano et al.,10 thereby suggesting that both approaches capture the same underlying class structure.
We have developed and validated an SSL model able to predict iCCA transcriptomic classes on WSIs from routine biopsy and surgical samples. This model showed good to very good performance for classifying the four main classes. The ability to predict transcriptomic iCCA classes on routine WSIs could affect patient management by predicting prognosis and guiding the treatment strategy (immunotherapy in inflammatory classes or targeted molecular therapies in hepatic stem-like class).
Abbreviations
AI, artificial intelligence; FFPE, formalin-fixed paraffin-embedded; HES, hematein eosin saffron; iCCA, intrahepatic cholangiocarcinoma; MIL, multiple instance learning; SSL, self-supervised learning; TGCA, The Cancer Genome Atlas; WSIs, whole slide images.
Financial support
Nuovo-Soldati Foundation, Appel à projet 10.13039/501100006678AFEF 2021 grant, 10.13039/501100008765SNFGE MAHGE grant.
Authors’ contribution
Study concept and design (AB, JC, KM, VP), acquisition of data (AB, TL, RN, GL, BP, MA, JA, VP, SL, NTA, JC); analysis and interpretation of data (AB, TL, CP, RN, TW, VP); drafting of the manuscript (AB, TL); critical revision of the manuscript for important intellectual content (ML, MB, KM, TW, VP); statistical analysis (AB, TL); obtained funding (AB); study supervision (TW, VP).
Data availability
The binary classification analysis are available in the online repository: https://github.com/trislaz/ICCA_prediction.
The TCGA-CHOL data including gene expression and WSIs are available from NIH GDC Data Portal (https://portal.gdc.cancer.gov/).
The RNA-seq data have been deposited in Gene Expression Omnibus (GEO) (accession no. [GSE244807](GSE244807)).
Patient consent statement and ethics approval
Written consent was obtained from all patients as required by French legislation. This study was approved by the local ethics committee (IRB 00006477 no.CER-2022-168).
Conflict of interest
The authors declare no conflicts of interest pertaining to this manuscript.
Please refer to the accompanying ICMJE disclosure forms for further details.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Global Burden of Disease Liver Cancer Collaboration Akinyemiju T.Abera S.Ahmed M.Alam N.Alemayohu M.A.The burden of primary liver cancer and underlying etiologies from 1990 to 2015 at the global, regional, and national level: results from the global burden of disease study 2015 JAMA Oncol 320171683169110.1001/jamaoncol.2017.305528983565 PMC 5824275 · doi ↗ · pubmed ↗
- 2Rahnemai-Azar A.A.Weisbrod A.Dillhoff M.Intrahepatic cholangiocarcinoma: molecular markers for diagnosis and prognosis Surg Oncol 26201712513710.1016/j.suronc.2016.12.00928577718 · doi ↗ · pubmed ↗
- 3Endo I.Gonen M.Yopp A.C.Intrahepatic cholangiocarcinoma: rising frequency, improved survival, and determinants of outcome after resection Ann Surg 2482008849610.1097/SLA.0b 013e 318176 c 4d 318580211 · doi ↗ · pubmed ↗
- 4Mavros M.N.Economopoulos K.P.Alexiou V.G.Treatment and prognosis for patients with intrahepatic cholangiocarcinoma: systematic review and meta-analysis JAMA Surg 149201456557410.1001/jamasurg.2013.513724718873 · doi ↗ · pubmed ↗
- 5Bridgewater J.Galle P.R.Khan S.A.Guidelines for the diagnosis and management of intrahepatic cholangiocarcinoma J Hepatol 6020141268128910.1016/j.jhep.2014.01.02124681130 · doi ↗ · pubmed ↗
- 6European Association for the Study of the Liver EASL-ILCA clinical practice guidelines on intrahepatic cholangiocarcinoma J Hepatol 202310.1016/j.jhep.2023.03.010S 0168-8278(23)00185-X 37084797 · doi ↗ · pubmed ↗
- 7Oh D.-Y.Ruth He A.Qin S.Durvalumab plus gemcitabine and cisplatin in advanced biliary tract cancer NEJM Evid 1202210.1056/EVI Doa 2200015 EVI Doa 220001538319896 · doi ↗ · pubmed ↗
- 8Kelley R.K.Ueno M.Yoo C.Pembrolizumab in combination with gemcitabine and cisplatin compared with gemcitabine and cisplatin alone for patients with advanced biliary tract cancer (KEYNOTE-966): a randomised, double-blind, placebo-controlled, phase 3 trial Lancet Lond Engl 40120231853186510.1016/S 0140-6736(23)00727-437075781 · doi ↗ · pubmed ↗
