Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks
Yanqiao Zhu, Jeehyun Hwang, Keir Adams, Zhen Liu, Bozhao Nan, Brock, Stenfors, Yuanqi Du, Jatin Chauhan, Olaf Wiest, Olexandr Isayev, Connor W., Coley, Yizhou Sun, Wei Wang

TL;DR
This paper introduces the MARCEL benchmark for molecular conformer ensemble learning, evaluating diverse datasets and models to understand how conformer information can enhance molecular property prediction.
Contribution
It presents the first comprehensive benchmark for learning from molecular conformer ensembles, including new datasets, evaluation strategies, and empirical insights.
Findings
Conformer ensemble learning can improve molecular property prediction.
Benchmark results highlight the benefits of explicit conformer modeling.
Diverse datasets extend beyond traditional drug-like molecules.
Abstract
Molecular Representation Learning (MRL) has proven impactful in numerous biochemical applications such as drug discovery and enzyme design. While Graph Neural Networks (GNNs) are effective at learning molecular representations from a 2D molecular graph or a single 3D structure, existing works often overlook the flexible nature of molecules, which continuously interconvert across conformations via chemical bond rotations and minor vibrational perturbations. To better account for molecular flexibility, some recent works formulate MRL as an ensemble learning problem, focusing on explicitly learning from a set of conformer structures. However, most of these studies have limited datasets, tasks, and models. In this work, we introduce the first MoleculAR Conformer Ensemble Learning (MARCEL) benchmark to thoroughly evaluate the potential of learning on conformer ensembles and suggest promising…
Peer Reviews
Decision·ICLR 2024 poster
The paper's main strength, in this reviewer's view, is that it thoroughly compares its approach to the state of the art. Its main value is likely that it can serve as a benchmarking basis for various approaches in the field. The main original aspect is the use of an ensemble-based approach, which affords to incorporate the dynamical aspect of molecules. The paper is also well written and meticulous at comparing to the state of the art,
None
In molecular machine learning, considering the dynamic structural transitions of molecules is an extremely important point. While there are studies predicting molecular dynamics simulations through machine learning, and existing research examining the impact and significance of conformers on machine learning predictions, the data and tasks are extremely limited. Thus, objectively comparing multiple methods on the same foundation is challenging. In this context, the benchmark proposed in this pap
In the four tasks developed in this study, the objective is defined as predicting the Boltzmann average of various properties over multiple conformers. Under this goal setting, it seems intuitive that using information from multiple conformers would naturally improve prediction accuracy. Therefore, it has not been proven that 'considering multiple conformers contributes to machine learning predictions of real data (e.g., actual experimental measurements of molecules rather than computed values).
Originality: This work curates novel datasets and benchmarks for an under-explored problem of molecular conformer ensemble learning. Quality: Detailed information about dataset curation, baseline experiment settings and results are clearly elaborated. Clarify: The writing of this paper is excellent and well-organized. Significance: The presented MARCEL benchmark will be useful and impactful for researchers to develop novel molecule representation learning methods on multiple molecular conf
(1) For 3D models, it is recommended to add at least one 3D graph transformer models as baseline, such as Equiformer [1]. (2) It is recommended to add discussions about [2] as [2] proposes a molecular conformer ensemble learning module named ConfDSS. Also, it is recommended to add it as a baseline if it can be applied to the task in MARCEL. [1] Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs. ICLR 2023. [2] Fast Quantum Property Prediction via Deeper 2D and 3D Gr
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Various Chemistry Research Topics
