Efficient and Programmable Exploration of Synthesizable Chemical Space
Shitong Luo, Connor W. Coley

TL;DR
PrexSyn is a transformer-based model that efficiently explores synthesizable chemical space, enabling programmable molecular discovery and optimization with high speed and property-specific control.
Contribution
It introduces PrexSyn, a large-scale, decoder-only transformer trained on synthesizable pathways, capable of property-conditioned molecule generation and optimization.
Findings
Achieves near-perfect reconstruction of synthesizable chemical space.
Generates molecules satisfying complex property conditions.
Outperforms synthesis-agnostic baselines in sampling efficiency.
Abstract
The constrained nature of synthesizable chemical space poses a significant challenge for sampling molecules that are both synthetically accessible and possess desired properties. In this work, we present PrexSyn, an efficient and programmable model for molecular discovery within synthesizable chemical space. PrexSyn is based on a decoder-only transformer trained on a billion-scale datastream of synthesizable pathways paired with molecular properties, enabled by a real-time, high-throughput C++-based data generation engine. The large-scale training data allows PrexSyn to reconstruct the synthesizable chemical space nearly perfectly at a high inference speed and learn the association between properties and synthesizable molecules. Based on its learned property-pathway mappings, PrexSyn can generate synthesizable molecules that satisfy not only single-property conditions but also composite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
