Advancing Ligand-based Virtual Screening and Molecular Generation with Pretrained Molecular Embedding Distance

Shiyun Wa; Yifei Wang; Simone Sciabola; Ye Wang

arXiv:2604.24474·cs.LG·April 28, 2026

Advancing Ligand-based Virtual Screening and Molecular Generation with Pretrained Molecular Embedding Distance

Shiyun Wa, Yifei Wang, Simone Sciabola, Ye Wang

PDF

TL;DR

This paper introduces pretrained embedding distance (PED), a scalable, task-agnostic similarity measure derived from pretrained molecular models, enhancing virtual screening and molecular generation in drug discovery.

Contribution

The work presents PED as a novel, efficient similarity metric that does not require task-specific training or hand-crafted descriptors, improving AI-driven drug discovery methods.

Findings

01

PED correlates well with traditional similarity metrics.

02

PED effectively ranks molecules for virtual screening.

03

PED guides molecular generation via reward design.

Abstract

Molecular similarity plays a central role in ligand-based drug discovery, such as virtual screening, analog searching, and goal-directed molecular generation. However, traditional similarity measures, ranging from fingerprint-based Tanimoto coefficients to 3D shape overlays, are often computationally expensive at scale or rely on hand-crafted molecular descriptors. Meanwhile, many deep learning approaches to similarity-aware design still depend on similarity-specific supervision or costly data curation, limiting their generality across targets. In this work, we propose pretrained embedding distance (PED) as an effective alternative, computed directly from pretrained molecular models without task-specific training. Experimental results show that PED exhibits distinct correlations with traditional similarity metrics, and performs effectively in both ranking molecules for virtual screening…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.