TL;DR
This study systematically compares various molecular embedding techniques with traditional representations in QSAR modeling, revealing that embeddings do not significantly outperform traditional methods in predictive tasks.
Contribution
The paper provides a comprehensive experimental comparison of five molecular embedding methods against traditional descriptors and fingerprints in QSAR scenarios.
Findings
Molecular embeddings do not significantly outperform traditional representations in QSAR tasks.
Supervised embeddings are competitive with traditional methods, while unsupervised embeddings tend to perform worse.
A large-scale evaluation with over 25,000 models highlights the need for careful selection of molecular representations.
Abstract
With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinders the process of choosing a suitable representation for QSAR modeling. A reason behind this issue is the difficulty of conducting a fair and thorough comparison of the different existing embedding approaches, which requires numerous experiments on various datasets and training scenarios. To close this gap, we reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques recently proposed in the literature. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
