A comprehensive benchmark of single-cell Hi-C embedding tools
Dylan Plummer, Xiuyuan Lang, Shanshan Zhang, Yan Li, Jing Li, Fulai Jin

TL;DR
This paper benchmarks 13 tools for analyzing single-cell Hi-C data and finds that data representation and preprocessing are more important than the tools themselves for capturing genome architecture heterogeneity.
Contribution
A new benchmarking framework for scHi-C embedding tools and insights into the impact of data representation and preprocessing.
Findings
No single tool performs best across all datasets under default settings.
Long-range contacts are better for embryonic stages, while short-range contacts are better for cell cycle and tissue complexity.
Deep-learning methods handle sparsity better and are more versatile across resolutions.
Abstract
Embedding is the key step in single-cell Hi-C (scHi-C) analysis which relies on capturing biological meaningful heterogeneity at various levels of genome architecture. To understand the strength and limitations of existing tools in various applications, here we use ten scHi-C datasets to benchmark thirteen embedding tools including Va3DE, a new convolutional neural network model that can accommodate large cell numbers. We built a software framework to decouple the preprocessing options of existing tools and found that no single tool works best across all datasets under default settings. The difficulty levels and preferred resolutions are different between benchmark datasets, and the choice of data representation and preprocessing strongly impact the embedding performance. Embedding cells from early embryonic stages relies on long-range compartment-scale contacts, but resolving cell…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Genomics and Chromatin Dynamics · Cancer Genomics and Diagnostics
