# Assigning Transcriptomic Subtypes to Chronic Lymphocytic Leukemia Samples Using Nanopore RNA-Sequencing and Self-Organizing Maps

**Authors:** Arsen Arakelyan, Tamara Sirunyan, Gisane Khachatryan, Siras Hakobyan, Arpine Minasyan, Maria Nikoghosyan, Meline Hakobyan, Andranik Chavushyan, Gevorg Martirosyan, Yervand Hakobyan, Hans Binder

PMC · DOI: 10.3390/cancers17060964 · Cancers · 2025-03-13

## TL;DR

This study uses nanopore sequencing and machine learning to classify chronic lymphocytic leukemia into subtypes linked to patient survival, offering a cost-effective diagnostic tool.

## Contribution

The novel integration of nanopore sequencing with machine learning enables accurate transcriptomic subtyping of CLL in resource-limited settings.

## Key findings

- CLL transcriptomic subtypes are associated with survival independent of mutations or gender.
- Nanopore sequencing combined with public data and machine learning enables cost-effective molecular subtyping.
- Disrupted gene modules in CLL include T cell cytotoxicity, proliferation, and splicing pathways.

## Abstract

Chronic lymphocytic leukemia (CLL) is a type of blood cancer where accurate subtyping can enhance diagnosis and treatment. In this study, we integrated nanopore sequencing data with publicly available Illumina datasets and applied machine learning to identify distinct molecular subtypes of CLL. These subtypes were linked to patient survival, independent of genetic mutations or gender. Our findings suggest that combining nanopore sequencing with machine learning provides a cost-effective approach to classifying CLL cases and improving personalized treatment strategies supporting more accessible and personalized CLL care in resource-limited settings.

Background/Objectives: Massively parallel sequencing technologies have advanced chronic lymphocytic leukemia (CLL) diagnostics and precision oncology. Illumina platforms, while offering robust performance, require substantial infrastructure investment and a large number of samples for cost-efficiency. Conversely, third-generation long-read nanopore sequencing from Oxford Nanopore Technologies (ONT) can significantly reduce sequencing costs, making it a valuable tool in resource-limited settings. However, nanopore sequencing faces challenges with lower accuracy and throughput than Illumina platforms, necessitating additional computational strategies. In this paper, we demonstrate that integrating publicly available short-read data with in-house generated ONT data, along with the application of machine learning approaches, enables the characterization of the CLL transcriptome landscape, the identification of clinically relevant molecular subtypes, and the assignment of these subtypes to nanopore-sequenced samples. Methods: Public Illumina RNA sequencing data for 608 CLL samples were obtained from the CLL-Map Portal. CLL transcriptome analysis, gene module identification, and transcriptomic subtype classification were performed using the oposSOM R package for high-dimensional data visualization with self-organizing maps. Eight CLL patients were recruited from the Hematology Center After Prof. R. Yeolyan (Yerevan, Armenia). Sequencing libraries were prepared from blood total RNA using the PCR-cDNA sequencing-barcoding kit (SQK-PCB109) following the manufacturer’s protocol and sequenced on an R9.4.1 flow cell for 24–48 h. Raw reads were converted to TPM values. These data were projected into the SOMs space using the supervised SOMs portrayal (supSOM) approach to predict the SOMs portrait of new samples using support vector machine regression. Results: The CLL transcriptomic landscape reveals disruptions in gene modules (spots) associated with T cell cytotoxicity, B and T cell activation, inflammation, cell cycle, DNA repair, proliferation, and splicing. A specific gene module contained genes associated with poor prognosis in CLL. Accordingly, CLL samples were classified into T-cell cytotoxic, immune, proliferative, splicing, and three mixed types: proliferative–immune, proliferative–splicing, and proliferative–immune–splicing. These transcriptomic subtypes were associated with survival orthogonal to gender and mutation status. Using supervised machine learning approaches, transcriptomic subtypes were assigned to patient samples sequenced with nanopore sequencing. Conclusions: This study demonstrates that the CLL transcriptome landscape can be parsed into functional modules, revealing distinct molecular subtypes based on proliferative and immune activity, with important implications for prognosis and treatment that are orthogonal to other molecular classifications. Additionally, the integration of nanopore sequencing with public datasets and machine learning offers a cost-effective approach to molecular subtyping and prognostic prediction, facilitating more accessible and personalized CLL care.

## Linked entities

- **Diseases:** Chronic lymphocytic leukemia (MONDO:0004948), CLL (MONDO:0004948)

## Full-text entities

- **Diseases:** T cell cytotoxicity (MESH:D016399), CLL (MESH:D015451), inflammation (MESH:D007249)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11940626/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11940626/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC11940626/full.md

---
Source: https://tomesphere.com/paper/PMC11940626