TL;DR
This paper introduces a novel NLP-based model, SPT, that predicts binary limiting activity coefficients from SMILES codes, significantly improving accuracy over existing models by leveraging synthetic and experimental data.
Contribution
The study develops the SMILES-to-Properties-Transformer (SPT), a new NLP model trained on synthetic and experimental data to accurately predict activity coefficients for unknown molecules.
Findings
SPT reduces mean prediction error by half compared to COSMO-RS and UNIFAC.
Training on synthetic data enhances model generalization to unknown molecules.
SPT outperforms recent machine learning models in activity coefficient prediction.
Abstract
Knowledge of mixtures' phase equilibria is crucial in nature and technical chemistry. Phase equilibria calculations of mixtures require activity coefficients. However, experimental data on activity coefficients is often limited due to high cost of experiments. For an accurate and efficient prediction of activity coefficients, machine learning approaches have been recently developed. However, current machine learning approaches still extrapolate poorly for activity coefficients of unknown molecules. In this work, we introduce the SMILES-to-Properties-Transformer (SPT), a natural language processing network to predict binary limiting activity coefficients from SMILES codes. To overcome the limitations of available experimental data, we initially train our network on a large dataset of synthetic data sampled from COSMO-RS (10 Million data points) and then fine-tune the model on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
