Advancing Ligand-based Virtual Screening and Molecular Generation with Pretrained Molecular Embedding Distance
Shiyun Wa, Yifei Wang, Simone Sciabola, Ye Wang

TL;DR
This paper introduces pretrained embedding distance (PED), a scalable, task-agnostic similarity measure derived from pretrained molecular models, enhancing virtual screening and molecular generation in drug discovery.
Contribution
The work presents PED as a novel, efficient similarity metric that does not require task-specific training or hand-crafted descriptors, improving AI-driven drug discovery methods.
Findings
PED correlates well with traditional similarity metrics.
PED effectively ranks molecules for virtual screening.
PED guides molecular generation via reward design.
Abstract
Molecular similarity plays a central role in ligand-based drug discovery, such as virtual screening, analog searching, and goal-directed molecular generation. However, traditional similarity measures, ranging from fingerprint-based Tanimoto coefficients to 3D shape overlays, are often computationally expensive at scale or rely on hand-crafted molecular descriptors. Meanwhile, many deep learning approaches to similarity-aware design still depend on similarity-specific supervision or costly data curation, limiting their generality across targets. In this work, we propose pretrained embedding distance (PED) as an effective alternative, computed directly from pretrained molecular models without task-specific training. Experimental results show that PED exhibits distinct correlations with traditional similarity metrics, and performs effectively in both ranking molecules for virtual screening…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
