Chemical-Reaction-Aware Molecule Representation Learning
Hongwei Wang, Weijiang Li, Xiaomeng Jin, Kyunghyun Cho, Heng Ji,, Jiawei Han, Martin D. Burke

TL;DR
This paper introduces a novel molecule representation learning method that leverages chemical reactions to improve embedding organization and generalization, achieving state-of-the-art results across multiple chemical prediction tasks.
Contribution
The proposed approach uniquely uses chemical reaction constraints to enhance molecule embeddings, independent of GNN architecture, leading to improved performance and generalization.
Findings
Achieves 17.4% Hit@1 gain in reaction prediction
Improves AUC by 2.3% in property prediction
Reduces RMSE by 18.5% in graph-edit-distance prediction
Abstract
Molecule representation learning (MRL) methods aim to embed molecules into a real vector space. However, existing SMILES-based (Simplified Molecular-Input Line-Entry System) or GNN-based (Graph Neural Networks) MRL methods either take SMILES strings as input that have difficulty in encoding molecule structure information, or over-emphasize the importance of GNN architectures but neglect their generalization ability. Here we propose using chemical reactions to assist learning molecule representation. The key idea of our approach is to preserve the equivalence of molecules with respect to chemical reactions in the embedding space, i.e., forcing the sum of reactant embeddings and the sum of product embeddings to be equal for each chemical equation. This constraint is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
