Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning
Mateusz Praski, Jakub Adamczyk, Wojciech Czech

TL;DR
This paper extensively compares 25 pretrained molecular embedding models across 25 datasets, revealing that most models do not outperform traditional fingerprints, with only the CLAMP model showing significant improvement.
Contribution
It provides the most comprehensive benchmarking of pretrained molecular embeddings, highlighting the need for more rigorous evaluation methods in molecular representation learning.
Findings
Most neural models do not outperform baseline fingerprints.
Only the CLAMP model shows statistically significant improvement.
Raises concerns about evaluation rigor in current studies.
Abstract
Pretrained neural networks have attracted significant interest in chemistry and small molecule drug design. Embeddings from these models are widely used for molecular property prediction, virtual screening, and small data learning in molecular chemistry. This study presents the most extensive comparison of such models to date, evaluating 25 models across 25 datasets. Under a fair comparison framework, we assess models spanning various modalities, architectures, and pretraining strategies. Using a dedicated hierarchical Bayesian statistical testing model, we arrive at a surprising result: nearly all neural models show negligible or no improvement over the baseline ECFP molecular fingerprint. Only the CLAMP model, which is also based on molecular fingerprints, performs statistically significantly better than the alternatives. These findings raise concerns about the evaluation rigor in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
