A Machine Learning Pipeline for Molecular Property Prediction using ChemXploreML
Aravindh Nivas Marimuthu, Brett A. McGuire

TL;DR
ChemXploreML is a flexible, modular desktop application that enables customizable molecular property prediction using various embedding techniques and machine learning algorithms, demonstrated with high accuracy on multiple properties.
Contribution
Introduces ChemXploreML, a modular framework integrating diverse molecular embeddings with machine learning, simplifying property prediction for researchers without extensive programming skills.
Findings
High prediction accuracy with R^2 up to 0.93 for critical temperature.
VICGAE embeddings offer comparable accuracy to Mol2Vec with better efficiency.
Framework supports easy integration of new techniques and automates data processing.
Abstract
We present ChemXploreML, a modular desktop application designed for machine learning-based molecular property prediction. The framework's flexible architecture allows integration of any molecular embedding technique with modern machine learning algorithms, enabling researchers to customize their prediction pipelines without extensive programming expertise. To demonstrate the framework's capabilities, we implement and evaluate two molecular embedding approaches - Mol2Vec and VICGAE (Variance-Invariance-Covariance regularized GRU Auto-Encoder) - combined with state-of-the-art tree-based ensemble methods (Gradient Boosting Regression, XGBoost, CatBoost, and LightGBM). Using five fundamental molecular properties as test cases - melting point (MP), boiling point (BP), vapor pressure (VP), critical temperature (CT), and critical pressure (CP) - we validate our framework on a dataset from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
