Self-Conditioned Denoising for Atomistic Representation Learning
Tynan Perez, Rafael Gomez-Bombarelli

TL;DR
This paper introduces Self-Conditioned Denoising (SCD), a novel pretraining method for atomistic data that improves downstream property prediction across various domains by leveraging self-embeddings for conditional denoising, matching or surpassing supervised methods.
Contribution
The paper presents SCD, a backbone-agnostic self-supervised pretraining approach that enhances atomistic representation learning across diverse datasets and models, outperforming existing SSL methods.
Findings
SCD significantly outperforms previous SSL methods on downstream benchmarks.
A small GNN pretrained with SCD achieves competitive performance with larger models.
SCD matches or exceeds supervised force-energy pretraining results.
Abstract
The success of large-scale pretraining in NLP and computer vision has catalyzed growing efforts to develop analogous foundation models for the physical sciences. However, pretraining strategies using atomistic data remain underexplored. To date, large-scale supervised pretraining on DFT force-energy labels has provided the strongest performance gains to downstream property prediction, out-performing existing methods of self-supervised learning (SSL) which remain limited to ground-state geometries, and/or single domains of atomistic data. We address these shortcomings with Self-Conditioned Denoising (SCD), a backbone-agnostic reconstruction objective that utilizes self-embeddings for conditional denoising across any domain of atomistic data, including small molecules, proteins, periodic materials, and 'non-equilibrium' geometries. When controlled for backbone architecture and pretraining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Electron Microscopy Techniques and Applications · Quantum many-body systems
