Balancing Molecular Information and Empirical Data in the Prediction of Physico-Chemical Properties
Johannes Zenn, Dominik Gond, Fabian Jirasek, Robert Bamler

TL;DR
This paper introduces a hybrid approach combining molecular descriptors and data-driven models using expectation maximization, improving the accuracy of physico-chemical property predictions by adaptively balancing the two methods.
Contribution
It presents a novel probabilistic framework that automatically detects when to rely on molecular structure or empirical data for better property prediction.
Findings
Significantly improves prediction accuracy over existing methods.
Effectively detects unreliable structure-based predictions and corrects them.
Demonstrates success on activity coefficient prediction in binary mixtures.
Abstract
Predicting the physico-chemical properties of pure substances and mixtures is a central task in thermodynamics. Established prediction methods range from fully physics-based ab-initio calculations, which are only feasible for very simple systems, over descriptor-based methods that use some information on the molecules to be modeled together with fitted model parameters (e.g., quantitative-structure-property relationship methods or classical group contribution methods), to representation-learning methods, which may, in extreme cases, completely ignore molecular descriptors and extrapolate only from existing data on the property to be modeled (e.g., matrix completion methods). In this work, we propose a general method for combining molecular descriptors with representation learning using the so-called expectation maximization algorithm from the probabilistic machine learning literature,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies
