CheMixHub: Datasets and Benchmarks for Chemical Mixture Property Prediction
Ella Miray Rajaonson, Mahyar Rajabi Kochi, Luis Martin Mejia Mendoza, Seyed Mohamad Moosavi, Benjamin Sanchez-Lengeling

TL;DR
CheMixHub provides a comprehensive benchmark dataset and evaluation framework for predicting properties of chemical mixtures, aiming to advance machine learning models in this underexplored area of chemical research.
Contribution
The paper introduces CheMixHub, a new benchmark dataset with diverse tasks and data splitting techniques, establishing initial deep learning benchmarks for chemical mixture property prediction.
Findings
Established baseline results for multiple chemical mixture prediction tasks.
Demonstrated the effectiveness of different data splitting methods for model evaluation.
Provided a publicly available dataset and code to facilitate future research.
Abstract
Developing improved predictive models for multi-molecular systems is crucial, as nearly every chemical product used results from a mixture of chemicals. While being a vital part of the industry pipeline, the chemical mixture space remains relatively unexplored by the Machine Learning community. In this paper, we introduce CheMixHub, a holistic benchmark for molecular mixtures, covering a corpus of 11 chemical mixtures property prediction tasks, from drug delivery formulations to battery electrolytes, totalling approximately 500k data points gathered and curated from 7 publicly available datasets. CheMixHub introduces various data splitting techniques to assess context-specific generalization and model robustness, providing a foundation for the development of predictive models for chemical mixture properties. Furthermore, we map out the modelling space of deep learning models for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography · Crystallization and Solubility Studies
