Biomedical Big Data and Artificial Intelligence in Blood
Fuhong He, Zhaojun Zhang, Xiangdong Fang, Qian-Fei Wang

Abstract
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare
The hematopoietic system has long served as an excellent model for biological and medical research, owing to its highly organized hierarchical structure, accessibility for sampling, and rapid cellular turnover. These features have enabled pivotal discoveries in stem cell biology, oncogenic transformation, and targeted therapies, exemplified by milestones such as the identification of the BCR-ABL fusion gene and the successful development of molecular-targeted treatments. With its intrinsic advantages, recently hematology continues to provide critical insights attributed to the rapid expansion of high-throughput omics technologies and bioinformatics, and is entering a new era that emphasizes data-driven discovery and intelligent clinical decision-making.
This special issue of Genomics, Proteomics & Bioinformatics, entitled “Biomedical Big Data in Blood”, includes 19 studies that collectively highlight the power of how omics integration and artificial intelligence (AI) technologies are reshaping hematological research (Table 1). These contributions cover resource construction, mechanistic exploration, translational applications, and computational modeling. Due to space limitations, we described 10 representative studies below.
Data integration and bioinformatic tools: building the digital foundation
Robust data resources and user-friendly analytical platforms are essential for advancing omics-driven hematology research. This section features four articles that focus on the development of specialized tools and databases addressing diverse cellular contexts, model organisms, and clinical scenarios.
Wang et al. [1] developed HemaScope, an open-source bioinformatics toolkit tailored for analyzing single-cell and spatial transcriptomic data in hematopoietic systems. HemaScope integrates modules for atlas construction, lineage tracing, dynamic transcriptional analysis, and microenvironmental profiling. The toolkit has demonstrated robust performance across multiple scenarios, including bone marrow aging, acute myeloid leukemia (AML), and T-cell lymphoma. Kang et al. [2] introduced HemAtlas, a cross-species, multi-omics database dedicated to hematopoiesis. It incorporates transcriptomic, epigenomic, and spatial transcriptomic data from humans, mice, zebrafish, and in vitro hematopoietic stem and progenitor cell models. HemAtlas enables comparative analyses across developmental stages and tissues, enhancing understanding of hematopoietic development and regeneration.
Zheng et al. [3] established EryDB, a comprehensive erythroid transcriptomic database integrating bulk and single-cell RNA-seq datasets across species and disease conditions. EryDB facilitates comparative exploration of erythropoiesis and identification of dysregulated pathways in erythroid-related disorders. Zhou et al. [4] developed NeoTCR, an immunoinformatic database of experimentally validated neoantigen-specific T-cell receptors (TCRs) from 18 cancer types. NeoTCR offers unified annotations of publicly available neoantigen-specific TCR sequences along with relevant neoantigen information. It also provides a one-stop platform for clonotype discovery, neoantigen annotation, and immunotherapy prediction, bridging sequencing data with precision immuno-oncology.
Collectively, these resources help close existing gaps in data accessibility, analytical capabilities, and clinical integration, laying a digital foundation for mechanistic discoveries and personalized therapeutic strategies.
Omics-guided mechanistic studies: from molecular insights to therapeutic strategies
The convergence of multi-omics technologies and clinical expertise is transforming our understanding of hematologic diseases, bridging molecular mechanisms with therapeutic advances. As mechanistic research enters the big data era, validation and translational foresight become crucial. This section highlights four studies that leverage genomics, transcriptomics, epigenomics, and integrative omics to uncover disease pathways and therapeutic strategies.
Epigenetic reprogramming is recognized as one of the major drivers in hematological malignancies. Zhang et al. [5] identified histone methyltransferase G9a inhibitors as candidate drugs for SETD2-deficient leukemia using the Connectivity Map combined with a drug screening platform. Their integrative analysis of transcriptomic and ChIP-seq data showed that G9a inhibition upregulates let-7a-2, potentially via H3K9me2 reduction, suppressing oncogenic MYC signaling. This epigenetic regulation–noncoding RNA–oncogenic axis positions G9a as a promising therapeutic target. Liu et al. [6] revealed that RNA N^6^-methyladenosine (m^6^A) modification affects hematopoietic stem cell differentiation and leukemogenesis. They found that ABCD2, as an m^6^A-regulated driver of AML progression, promotes leukemogenesis by modulating fatty acid metabolism and maintaining leukemia cell viability. These findings highlight the central role of RNA epigenetics in stem cell fate and malignant transformation, paving the way for novel epitranscriptomic therapeutic strategies. Wu et al. [7] reported that lactate generated by the Warburg effect induces histone H3K18 lactylation in T-cell acute lymphoblastic leukemia (T-ALL), marking super-enhancer regions and activating oncogenes such as IGFBP2. This metabolism–epigenetics–transcription axis redefines the oncogenic role of lactate through chromatin remodeling, and suggests that targeting lactate-driven H3K18 lactylation could represent a promising therapeutic avenue for T-ALL.
Interestingly, one study exemplifies the synergy between traditional Chinese medicine and modern omics technologies in generating actionable translational insights. Wang et al. [8] applied plasma proteomics to investigate high-altitude-induced myocardial injury and identified ITGA2B as a regulator of IL-6 expression and metabolic reprogramming under hypoxia. Tanshinone IIa, a compound derived from traditional Chinese medicine, reversed ITGA2B-mediated myocardial injury.
Together, these studies underscore the power of integrative omics with detailed clinical phenotyping in decoding hematologic pathogenesis and guiding precision medicine. As the field evolves, sustained collaboration between omics researchers and clinicians remains essential to translate these discoveries into meaningful benefits for patients.
AI empowering: advancing hematological research and clinical translation
The integration of AI into hematological research has enabled advances in both biological understanding and clinical decision-making. This special issue highlights cutting-edge applications of AI models that decode disease complexity and bridge computational predictions with clinical utility. Two contributions featured here present innovative AI-driven strategies for identifying regulatory drivers and refining disease classification.
A et al. [9] developed the DyNDG model, which integrates machine learning and dynamic network modeling. By constructing a time-series multilayer network and applying a random-walk-based propagation framework, DyNDG captures temporal changes in gene interactions and identifies leukemia-related genes with higher accuracy. This strategy enhances the discovery of stage-specific biomarkers and emphasizes the value of temporal dynamics in gene prioritization. Dai et al. [10] presented HematoMap, a platform combining AI and single-cell omics to quantify lineage aberrancy and infer leukemic origins. Using cosine similarity and LASSO regression, HematoMap maps transcriptional deviations from bulk RNA-seq data. This enables personalized risk stratification and links molecular subtypes to clinical outcomes, demonstrating the broad applicability of AI in translational hematology.
In summary, these studies exemplify the dual impact of AI: resolving biological complexity through advanced pattern recognition and promoting clinical innovation through predictive modeling.
Outlook: a new era of mechanistic discoveries and intelligent decision-making
This special issue reflects rapid progress in hematology, driven by multi-omics data resources and integration as well as computational tools. Future progress will rely on three pillars: standardized data curation, interpretable intelligent algorithms, and deep investigation of biological mechanisms. To promote openness and reproducibility, we strongly encourage contributors to share their datasets and analytical tools through national infrastructure platforms like the National Genomics Data Center (NGDC) of China. Such efforts will promote data interoperability and foster collaborative innovation across the hematology research community.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Wang Z , Miao Y, Li H, Cheng W, Shi M, Lv G, et al Hema Scope: a tool for analyzing single-cell and spatial transcriptomics data of hematopoietic cells. Genomics Proteomics Bioinformatics 2025;23:qzaf 002.39862439 10.1093/gpbjnl/qzaf 002PMC 12374577 · doi ↗ · pubmed ↗
- 2Kang Z , Zhu T, Zou D, Liu M, Zhang Y, Wang L, et al Hem Atlas: a multi-omics hematopoiesis database. Genomics Proteomics Bioinformatics 2025;23:qzaf 026.40106419 10.1093/gpbjnl/qzaf 026PMC 12374576 · doi ↗ · pubmed ↗
- 3Zheng G , Wu S, Zhang Z, Xin Z, Zhang L, Zhao S, et al Ery DB: a transcriptomic profile database for erythropoiesis and erythroid-related diseases. Genomics Proteomics Bioinformatics 2025;23:qzae 029.39436241 10.1093/gpbjnl/qzae 029 · doi ↗ · pubmed ↗
- 4Zhou W , Xiang W, Yu J, Ruan Z, Pan Y, Wang K, et al Neo TCR: an immunoinformatic database of experimentally-supported functional neoantigen-specific TCR sequences. Genomics Proteomics Bioinformatics 2025;23:qzae 010.39436255 10.1093/gpbjnl/qzae 010 · doi ↗ · pubmed ↗
- 5Zhang Y , Xia M, Yi Z, Sui P, He X, Wang L, et al Integrated computational and functional screening identify G 9a inhibitors for SETD 2-mutant leukemia. Genomics Proteomics Bioinformatics 2025;23:qzaf 035.40300107 10.1093/gpbjnl/qzaf 035PMC 12373973 · doi ↗ · pubmed ↗
- 6Liu W , Wang Y, Yao S, Han G, Hu J, Yin R, et al Reprogramming of RNA m 6A modification is required for acute myeloid leukemia development. Genomics Proteomics Bioinformatics 2025;23:qzae 049.38913865 10.1093/gpbjnl/qzae 049PMC 12373641 · doi ↗ · pubmed ↗
- 7Wu W , Zhang J, Sun H, Wu X, Wang H, Cui B, et al Glycolysis induces abnormal transcription through histone lactylation in T-lineage acute lymphoblastic leukemia. Genomics Proteomics Bioinformatics 2025;23:qzaf 029.40193528 10.1093/gpbjnl/qzaf 029PMC 12402983 · doi ↗ · pubmed ↗
- 8Wang Y , Shen P, Wu Z, Tu B, Zhang C, Zhou Y, et al Plasma proteomic profiling reveals ITGA 2B as a key regulator of heart health in high-altitude settlers. Genomics Proteomics Bioinformatics 2025;23:qzaf 030.40198259 10.1093/gpbjnl/qzaf 030PMC 12417084 · doi ↗ · pubmed ↗
