Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis
Kelin Xia

TL;DR
This paper introduces SeqMM, a multiscale model for analyzing Hi-C data that balances Euclidean and genomic distances to reveal chromosome structures and TADs, showing scale-dependent variations and robustness.
Contribution
The paper presents a novel sequence-based multiscale model that distinguishes global and local clustering in chromosome data, providing insights into hierarchical structures and TAD boundary robustness.
Findings
SeqMM reveals differences between global and local clustering scales.
TAD boundaries vary significantly at different scales, especially at small scales.
At larger scales, TAD boundaries become more consistent and robust.
Abstract
In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
