Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models

Yi Xiong; Liang Xiong; Xiaohong Ji; Sen Yang; Zhifeng Gao; Huaimin Wang; Kele Xu

arXiv:2604.12350·cs.LG·April 15, 2026

Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models

Yi Xiong, Liang Xiong, Xiaohong Ji, Sen Yang, Zhifeng Gao, Huaimin Wang, Kele Xu

PDF

TL;DR

This paper introduces SCPT, a pipeline for scaffold-conditioned preference triplets that enables controllable, scaffold-preserving molecular optimization using large language models, improving success rates and property gains.

Contribution

The paper presents a novel scaffold-conditioned preference triplet construction method for training LLMs to perform controlled molecular edits that preserve scaffolds.

Findings

01

SCPT improves optimization success and property gains.

02

Models trained on limited supervision generalize well to multi-property tasks.

03

SCPT allows systematic control over similarity and property trade-offs.

Abstract

Molecular property optimization is central to drug discovery, yet many deep learning methods rely on black-box scoring and offer limited control over scaffold preservation, often producing unstable or biologically implausible edits. While large language models (LLMs) are promising molecular generators, optimization remains constrained by the lack of chemistry-grounded preference supervision and principled data curation. We introduce \textbf{Scaffold-Conditioned Preference Triplets (SCPT)}, a pipeline that constructs similarity-constrained triplets $⟨ scaffold, better, worse ⟩$ via scaffold alignment and chemistry-driven filters for validity, synthesizability, and meaningful property gains. Using these preferences, we align a pretrained molecular LLM as a conditional editor, enabling property-improving edits that retain the scaffold. Across single- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.