GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance
Jiri Navratil, Jarret Ross, Payel Das, Youssef Mroueh, Samuel C Hoffman, Vijil Chenthamarakshan, Brian Belgodere

TL;DR
This paper presents a training-free, test-time method for molecular optimization that guides a generative language model to produce molecules similar to a target, improving property optimization and drug design tasks.
Contribution
Introduces GP-MoLFormer-Sim, a novel test-time approach that uses contextual similarity to steer molecular generation without retraining the model.
Findings
Outperforms existing training-free methods on standard benchmarks.
Effective in property optimization, molecular rediscovery, and drug design.
Enhances generative control via similarity guidance in CLMs.
Abstract
The ability to design molecules while preserving similarity to a target molecule and/or property is crucial for various applications in drug discovery, chemical design, and biology. We introduce in this paper an efficient training-free method for navigating and sampling from the molecular space with a generative Chemical Language Model (CLM), while using the molecular similarity to the target as a guide. Our method leverages the contextual representations learned from the CLM itself to estimate the molecular similarity, which is then used to adjust the autoregressive sampling strategy of the CLM. At each step of the decoding process, the method tracks the distance of the current generations from the target and updates the logits to encourage the preservation of similarity in generations. We implement the method using a recently proposed 47M parameter SMILES-based CLM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Protein Structure and Dynamics
