Improving Molecular Representation Learning with Metric Learning-enhanced Optimal Transport
Fang Wu, Nicolas Courty, Shuting Jin, Stan Z. Li

TL;DR
This paper introduces MROT, a novel optimal transport-based algorithm that improves molecular representation learning by better generalizing across chemical domains, especially in limited or heterogeneous data scenarios.
Contribution
The paper develops MROT, a new optimal transport method that incorporates domain distance metrics and posterior variance regularization to enhance molecular regression generalization.
Findings
MROT outperforms existing models in chemical property prediction.
MROT effectively bridges chemical domain gaps.
Results demonstrate MROT's potential in discovering new substances.
Abstract
Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Fuel Cells and Related Materials
MethodsTriplet Loss
