CLAMP: A Contrastive Language And Molecule Pre-training Network
Neel Redkar

TL;DR
CLAMP introduces a novel language-to-molecule pre-training network that leverages web-scraped data and contrastive learning to enable zero-shot classification and prediction of chemical materials from text descriptions.
Contribution
The paper presents a new language-to-material generation architecture using contrastive learning with web-scraped data, enabling zero-shot classification without specific training data.
Findings
Achieved ~82% accuracy in material classification without training data.
Attained ~75% accuracy in photocatalyst prediction with small datasets.
Proposes a framework that can be applied to any text-described chemical reaction.
Abstract
This paper highlights a shift in how to approach material generation. Instead of material-to-material, we propose a language-to-material generation architecture that utilizes millions of untapped data points. Using a web scraper to collect crystal text pairs from open-source research papers, a contrastive model can be trained using a convolutional graph neural network encoder and a language encoder. This would allow unsupervised zero-shot classification which can be trained by taking advantage of linguistic structure. Without any specific training data, an ~82\% accuracy was achieved and ~75\% accuracy for photocatalyst prediction with an extremely small dataset. This novel network could ideally be cross-applied to any reaction that can be described via text, opening completely new methods to think about 3D chemical framework generation. In the full experiment diffusion models would…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
MethodsGraph Neural Network · Diffusion
