Improving Self-supervised Molecular Representation Learning using Persistent Homology
Yuankai Luo, Lei Shi, Veronika Thost

TL;DR
This paper introduces a novel self-supervised learning method for molecular representations using persistent homology, demonstrating improved predictive power especially on small datasets through a new contrastive loss.
Contribution
It proposes a new SSL approach based on persistent homology, including an autoencoder and a contrastive loss, enhancing molecular property prediction performance.
Findings
Representations are more predictive after SSL.
The contrastive loss improves baseline performance.
Significant gains on small datasets.
Abstract
Self-supervised learning (SSL) has great potential for molecular representation learning given the complexity of molecular graphs, the large amounts of unlabelled data available, the considerable cost of obtaining labels experimentally, and the hence often only small training datasets. The importance of the topic is reflected in the variety of paradigms and architectures that have been investigated recently. Yet the differences in performance seem often minor and are barely understood to date. In this paper, we study SSL based on persistent homology (PH), a mathematical tool for modeling topological features of data that persist across multiple scales. It has several unique features which particularly suit SSL, naturally offering: different views of the data, stability in terms of distance preservation, and the opportunity to flexibly incorporate domain knowledge. We (1) investigate an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopological and Geometric Data Analysis · Bioinformatics and Genomic Networks · Computational Drug Discovery Methods
