Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis
Yingcong Li, Chandra Sekhar Mukherjee, Jiapeng Zhang

TL;DR
This paper introduces a confident clustering method that improves accuracy and stability in single-cell RNA-seq data analysis by reducing the influence of boundary data points through a PCA compression ratio approach.
Contribution
The paper presents a novel confident clustering algorithm that enhances clustering accuracy and stability, especially for biological data with boundary points, using PCA compression ratio.
Findings
High accuracy on tested single-cell RNA-seq datasets
Stable results across different parameter choices
Effective reduction of boundary data point influence
Abstract
Unsupervised clustering algorithms for vectors has been widely used in the area of machine learning. Many applications, including the biological data we studied in this paper, contain some boundary datapoints which show combination properties of two underlying clusters and could lower the performance of the traditional clustering algorithms. We develop a confident clustering method aiming to diminish the influence of these datapoints and improve the clustering results. Concretely, for a list of datapoints, we give two clustering results. The first-round clustering attempts to classify only pure vectors with high confidence. Based on it, we classify more vectors with less confidence in the second round. We validate our algorithm on single-cell RNA-seq data, which is a powerful and widely used tool in biology area. Our confident clustering shows a high accuracy on our tested datasets. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Advanced biosensing and bioanalysis techniques
