# A hybrid neighborhood enhanced contrastive learning and self-knowledge distillation method for scRNA-seq data clustering analysis

**Authors:** Lihua Qi, Peng Wang, Hao Liu, Chen Chen, Jin Gu, Cheng Chen

PMC · DOI: 10.1093/bioinformatics/btag084 · 2026-03-29

## TL;DR

This paper introduces scKD, a new method for analyzing single-cell RNA sequencing data that improves cell type classification accuracy and robustness.

## Contribution

The novel scKD method combines hybrid neighborhood contrastive learning with self-knowledge distillation for improved clustering of scRNA-seq data.

## Key findings

- scKD outperforms existing methods in identifying cell subpopulations and clustering stability.
- The method accurately detects both major and rare cell types in real-world datasets.
- scKD demonstrates robustness and adaptability across different biological contexts.

## Abstract

Single-cell heterogeneity analysis faces significant challenges due to the high dimensionality, complexity, and noise inherent in scRNA-seq data, especially when aiming for precise cell type classification. Existing analytical methods often exhibit limited generalization ability and adaptability across different biological contexts, leading to biased identification of cell subpopulations and hindering a comprehensive understanding of diseases, therapeutic responses, and biological processes.

To address these issues, we propose a novel method named scKD, which integrates a hybrid neighbourhood-enhanced comparative learning model with a self-knowledge distillation strategy. scKD enhances clustering accuracy and is capable of accurately identifying both major cell types and rare cell subtypes. Extensive evaluations on multiple real-world datasets demonstrate that scKD achieves superior performance in subpopulation identification, clustering stability, and robustness. These results suggest that scKD is a powerful and reliable tool for analyzing single-cell transcriptomic data, facilitating deeper insights into cellular heterogeneity.

All datasets used in this study are publicly available. Detailed information about all the single-cell datasets analyzed in this paper is provided in Supplementary Table 1. All datasets can be accessed at https://zenodo.org/records/15412380. The source code is available at https://github.com/A-qlh/sckd.

Supplementary data are available at Bioinformatics online.

## Full-text entities

- **Diseases:** tumor (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC13033185/full.md

---
Source: https://tomesphere.com/paper/PMC13033185