ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space
Jun-Hyoung Park, Ho-Jun Song, and Seong-Whan Lee

TL;DR
ChemFixer is a transformer-based framework that corrects invalid molecules generated by deep learning models, thereby expanding accessible chemical space and improving drug discovery processes.
Contribution
It introduces ChemFixer, a novel method for correcting invalid molecules using a transformer architecture trained on large-scale molecular pairs.
Findings
Significantly improves molecular validity across generative models.
Preserves chemical and biological distributional properties of molecules.
Enhances drug discovery by expanding chemical space and aiding downstream tasks.
Abstract
Deep learning-based molecular generation models have shown great potential in efficiently exploring vast chemical spaces by generating potential drug candidates with desired properties. However, these models often produce chemically invalid molecules, which limits the usable scope of the learned chemical space and poses significant challenges for practical applications. To address this issue, we propose ChemFixer, a framework designed to correct invalid molecules into valid ones. ChemFixer is built on a transformer architecture, pre-trained using masking techniques, and fine-tuned on a large-scale dataset of valid/invalid molecular pairs that we constructed. Through comprehensive evaluations across diverse generative models, ChemFixer improved molecular validity while effectively preserving the chemical and biological distributional properties of the original outputs. This indicates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Advanced Graph Neural Networks
