Hierarchical Structure-Property Alignment for Data-Efficient Molecular Generation and Editing
Ziyu Fan, Zhijian Huang, Yahan Li, Xiaowen Hu, Siyuan Shen, Yunliang Wang, Zeyu Zhong, Shuhong Liu, Shuning Yang, Shangqian Wu, Min Wu, Lei Deng

TL;DR
HSPAG is a hierarchical, data-efficient framework that improves property-constrained molecular generation and editing by learning structure-property relationships across multiple levels and reducing data requirements.
Contribution
The paper introduces HSPAG, a novel hierarchical structure-property alignment model that enhances data efficiency and controllable molecular generation with sparse annotations.
Findings
HSPAG effectively captures detailed structure-property relationships.
The model supports controllable generation under multiple property constraints.
Case studies demonstrate successful molecular editing capabilities.
Abstract
Property-constrained molecular generation and editing are crucial in AI-driven drug discovery but remain hindered by two factors: (i) capturing the complex relationships between molecular structures and multiple properties remains challenging, and (ii) the narrow coverage and incomplete annotations of molecular properties weaken the effectiveness of property-based models. To tackle these limitations, we propose HSPAG, a data-efficient framework featuring hierarchical structure-property alignment. By treating SMILES and molecular properties as complementary modalities, the model learns their relationships at atom, substructure, and whole-molecule levels. Moreover, we select representative samples through scaffold clustering and hard samples via an auxiliary variational auto-encoder (VAE), substantially reducing the required pre-training data. In addition, we incorporate a property…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Chemical Synthesis and Analysis
