Beyond Learning on Molecules by Weakly Supervising on Molecules
Gordan Prastalo, Kevin Maik Jablonka

TL;DR
This paper introduces ACE-Mol, a model that uses weak supervision from molecular motifs and natural language descriptors to produce task-specific, interpretable molecular representations, achieving state-of-the-art results.
Contribution
The paper presents ACE-Mol, a novel approach that leverages weak supervision from programmatically derived motifs and natural language to create task-conditioned molecular embeddings.
Findings
ACE-Mol outperforms existing models on property prediction benchmarks.
ACE-Mol provides interpretable and chemically meaningful representations.
The approach scales easily with cheap, programmatic supervision.
Abstract
Molecular representations are inherently task-dependent, yet most pre-trained molecular encoders are not. Task conditioning promises representations that reorganize based on task descriptions, but existing approaches rely on expensive labeled data. We show that weak supervision on programmatically derived molecular motifs is sufficient. Our Adaptive Chemical Embedding Model (ACE-Mol) learns from hundreds of motifs paired with natural language descriptors that are cheap to compute, trivial to scale. Conventional encoders slowly search the embedding space for task-relevant structure, whereas ACE-Mol immediately aligns its representations with the task. ACE-Mol achieves state-of-the-art performance across molecular property prediction benchmarks with interpretable, chemically meaningful representations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
