Information retrieval in single cell chromatin analysis using TF-IDF transformation methods
Mehrdad Zandigohar, Yang Dai

TL;DR
This paper evaluates various transformation and dimension reduction methods, including TF-IDF, SVD, and autoencoders, to improve the analysis of high-dimensional, sparse scATAC-seq data for better cell clustering and feature extraction.
Contribution
It provides a comprehensive comparison of transformation techniques for scATAC-seq data, highlighting the effectiveness of TF-IDF and its impact on downstream analysis.
Findings
TF-IDF improves clustering accuracy
TF-IDF enhances biologically relevant feature extraction
Autoencoders benefit from TF-IDF transformation
Abstract
Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) assesses genome-wide chromatin accessibility in thousands of cells to reveal regulatory landscapes in high resolutions. However, the analysis presents challenges due to the high dimensionality and sparsity of the data. Several methods have been developed, including transformation techniques of term-frequency inverse-document frequency (TF-IDF), dimension reduction methods such as singular value decomposition (SVD), factor analysis, and autoencoders. Yet, a comprehensive study on the mentioned methods has not been fully performed. It is not clear what is the best practice when analyzing scATAC-seq data. We compared several scenarios for transformation and dimension reduction as well as the SVD-based feature analysis to investigate potential enhancements in scATAC-seq information retrieval. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Genomics and Chromatin Dynamics
