Dory: Overcoming Barriers to Computing Persistent Homology

Manu Aggarwal; Vipul Periwal

arXiv:2103.05608·cs.LG·March 23, 2021·1 cites

Dory: Overcoming Barriers to Computing Persistent Homology

Manu Aggarwal, Vipul Periwal

PDF

Open Access

TL;DR

Dory is a scalable, memory-efficient algorithm that significantly accelerates the computation of persistent homology for large high-dimensional datasets, enabling new biological insights.

Contribution

The paper introduces Dory, a novel algorithm that reduces memory usage and computation time for persistent homology, allowing analysis of datasets with millions of points.

Findings

01

Dory computes PH faster than existing algorithms.

02

Dory uses less memory, enabling larger dataset analysis.

03

Application to human genome data reveals topology changes with treatment.

Abstract

Persistent homology (PH) is an approach to topological data analysis (TDA) that computes multi-scale topologically invariant properties of high-dimensional data that are robust to noise. While PH has revealed useful patterns across various applications, computational requirements have limited applications to small data sets of a few thousand points. We present Dory, an efficient and scalable algorithm that can compute the persistent homology of large data sets. Dory uses significantly less memory than published algorithms and also provides significant reductions in the computation time compared to most algorithms. It scales to process data sets with millions of points. As an application, we compute the PH of the human genome at high resolution as revealed by a genome-wide Hi-C data set. Results show that the topology of the human genome changes significantly upon treatment with auxin, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Genomics and Chromatin Dynamics · Clusterin in disease pathology