Pure Component Property Estimation Framework Using Explainable Machine Learning Methods
Jianfeng Jiao, Xi Gao, and Jie Li

TL;DR
This paper introduces an explainable machine learning framework for predicting pure component properties, utilizing molecular connectivity features, feature ranking, and interpretability techniques to improve accuracy and understanding.
Contribution
The work presents a novel framework combining molecular connectivity representations, feature pooling, and Shapley value analysis for accurate and interpretable property prediction.
Findings
Root-mean-square error reduced by up to 83.8% compared to GC models.
Feature pooling decreases features from 13,316 to 100 without losing accuracy.
Different properties are influenced by distinct structural features, consistent with mechanistic insights.
Abstract
Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modeling, and optimization. In this work, an enhanced framework for pure component property prediction by using explainable machine learning methods is proposed. In this framework, the molecular representation method based on the connectivity matrix effectively considers atomic bonding relationships to automatically generate features. The supervised machine learning model random forest is applied for feature ranking and pooling. The adjusted R2 is introduced to penalize the inclusion of additional features, providing an assessment of the true contribution of features. The prediction results for normal boiling point (Tb), liquid molar volume, critical temperature (Tc) and critical pressure (Pc) obtained using Artificial Neural Network and Gaussian Process Regression models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Phase Equilibria and Thermodynamics · Computational Drug Discovery Methods
MethodsSparse Evolutionary Training · Gaussian Process
