$\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning
Kerstin Kl\"aser, B{\l}a\.zej Banaszewski, Samuel Maddrell-Mander,, Callum McLean, Luis M\"uller, Ali Parviz, Shenyang Huang, Andrew Fitzgibbon

TL;DR
MiniMol is a compact, 10-million-parameter foundation model for molecular learning, pre-trained on diverse quantum and biological tasks, demonstrating strong transferability and outperforming larger models on multiple downstream tasks.
Contribution
This work introduces MiniMol, a small yet effective foundation model for molecular learning, designed with parameter efficiency and trained on a diverse set of molecular tasks.
Findings
MiniMol outperforms previous state-of-the-art models on 17 downstream tasks.
Pre-training on diverse tasks enables strong generalization across molecular applications.
MiniMol demonstrates the effectiveness of small, efficient models in complex molecular learning scenarios.
Abstract
In biological tasks, data is rarely plentiful as it is generated from hard-to-gather measurements. Therefore, pre-training foundation models on large quantities of available data and then transfer to low-data downstream tasks is a promising direction. However, how to design effective foundation models for molecular learning remains an open question, with existing approaches typically focusing on models with large parameter capacities. In this work, we propose , a foundational model for molecular learning with 10 million parameters. is pre-trained on a mix of roughly 3300 sparsely defined graph- and node-level tasks of both quantum and biological nature. The pre-training dataset includes approximately 6 million molecules and 500 million labels. To demonstrate the generalizability of across tasks, we evaluate it on downstream tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Machine Learning in Bioinformatics · Computational Drug Discovery Methods
