# Revealing the Best Strategies for Rare Cell Type Detection in Multi-Sample Single-Cell Datasets

**Authors:** Zhiwei Ye, Yinqiao Yan, Yuanyuan Yu, Hao Wu

PMC · DOI: 10.3390/genes17010031 · 2025-12-29

## TL;DR

This study compares strategies for detecting rare cell types in multi-sample single-cell RNA sequencing data, finding that batch-corrected pooled analysis works best.

## Contribution

The study systematically benchmarks rare cell detection methods in multi-sample settings and identifies optimal analytical strategies.

## Key findings

- Batch-corrected pooled sample detection outperformed other strategies across methods and datasets.
- scCAD showed the most robust and stable performance among evaluated tools.

## Abstract

Background: Single-cell RNA sequencing (scRNA-seq) enables high-resolution characterization of cellular heterogeneity and provides unique opportunities to identify rare cell populations that may be obscured in bulk transcriptomic data. However, despite the growing interest in rare-cell discovery, most existing detection methods were originally developed for single-sample datasets, and their behavior in multi-sample settings—where batch effects, sample imbalance, and heterogeneous cell-type compositions are common—remains poorly understood. This study aims to systematically evaluate representative rare cell detection methods under multi-sample settings and identify the most effective analytical strategies. Methods: We performed a comprehensive benchmarking analysis of five widely used rare cell detection tools, CellSIUS, GapClust, GiniClust, scCAD, SCISSORS and a scGPT-based rare cell detection method using Isolation Forest. Each method was evaluated under three analytical strategies: individual sample detection, pooled sample detection, and batch-corrected pooled sample detection. Performance was assessed across multiple publicly available scRNA-seq datasets using standardized evaluation metrics. Results: Batch-corrected pooled sample detection consistently achieved the highest overall performance across methods and datasets, whereas individual sample detection produced the weakest results. Among the evaluated tools, scCAD demonstrated the most robust and stable performance across dataset types and analytical conditions. Conclusions: This study provides strategy-level comparison in multi-sample settings. Our findings highlight the importance of batch correction and pooled analysis for improving rare cell detection accuracy and offer practical guidance for selecting optimal methods and analytical workflows in large-scale single-cell transcriptomic studies.

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12840603/full.md

---
Source: https://tomesphere.com/paper/PMC12840603