Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models
Guangyong Chen, Pengfei Chen, Chang-Yu Hsieh, Chee-Kong Lee, Benben, Liao, Renjie Liao, Weiwen Liu, Jiezhong Qiu, Qiming Sun, Jie Tang, Richard, Zemel, Shengyu Zhang

TL;DR
The paper introduces Alchemy, a large and diverse quantum chemistry dataset of organic molecules, to facilitate benchmarking and development of machine learning models in chemistry and material science.
Contribution
It provides a new extensive dataset with quantum properties for organic molecules, expanding existing resources and enabling better model validation and development.
Findings
State-of-the-art graph neural networks perform well on Alchemy
The dataset enhances validation of ML models in chemistry
Benchmark results demonstrate dataset's usefulness
Abstract
We introduce a new molecular dataset, named Alchemy, for developing machine learning models useful in chemistry and material science. As of June 20th 2019, the dataset comprises of 12 quantum mechanical properties of 119,487 organic molecules with up to 14 heavy atoms, sampled from the GDB MedChem database. The Alchemy dataset expands the volume and diversity of existing molecular datasets. Our extensive benchmarks of the state-of-the-art graph neural network models on Alchemy clearly manifest the usefulness of new data in validating and developing machine learning models for chemistry and material science. We further launch a contest to attract attentions from researchers in the related fields. More details can be found on the contest website \footnote{https://alchemy.tencent.com}. At the time of benchamrking experiment, we have generated 119,487 molecules in our Alchemy dataset. More…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies
MethodsGraph Neural Network
