Multi-Constraint Molecular Generation using Sparsely Labelled Training Data for Localized High-Concentration Electrolyte Diluent Screening
Jonathan P. Mailoa, Xin Li, Jiezhong Qiu, Shengyu Zhang

TL;DR
This paper introduces ConGen, a semi-supervised molecular generative model that effectively handles sparsely labeled training data to generate molecules with multiple property constraints, demonstrated in electrolyte diluent screening.
Contribution
The work extends the semi-supervised variational auto-encoder to handle sparsely labeled data, enabling multi-constraint molecule generation from limited property labels.
Findings
ConGen outperforms existing models in multi-constraint molecule generation.
It successfully generates candidate molecules for Lithium-ion battery electrolytes.
The approach reduces the need for fully labeled datasets in molecular design.
Abstract
Recently, machine learning methods have been used to propose molecules with desired properties, which is especially useful for exploring large chemical spaces efficiently. However, these methods rely on fully labelled training data, and are not practical in situations where molecules with multiple property constraints are required. There is often insufficient training data for all those properties from publicly available databases, especially when ab-initio simulation or experimental property data is also desired for training the conditional molecular generative model. In this work, we show how to modify a semi-supervised variational auto-encoder (SSVAE) model which only works with fully labelled and fully unlabelled molecular property training data into the ConGen model, which also works on training data that have sparsely populated labels. We evaluate ConGen's performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Fuel Cells and Related Materials
