Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry
Marvin Alberts, Oliver Schilter, Federico Zipoli, Nina Hartrampf,, Teodoro Laino

TL;DR
This paper introduces a large, multimodal spectroscopic dataset for 790,000 molecules, enabling machine learning models to integrate multiple spectroscopic techniques for improved molecular structure elucidation.
Contribution
The creation of a comprehensive multimodal spectroscopic dataset and benchmarks for machine learning tasks in molecular structure prediction.
Findings
Dataset includes simulated NMR, IR, and MS spectra for 790k molecules.
Benchmarks for structure elucidation, spectrum prediction, and functional group identification.
Potential to automate and improve molecular discovery processes.
Abstract
Spectroscopic techniques are essential tools for determining the structure of molecules. Different spectroscopic techniques, such as Nuclear magnetic resonance (NMR), Infrared spectroscopy, and Mass Spectrometry, provide insight into the molecular structure, including the presence or absence of functional groups. Chemists leverage the complementary nature of the different methods to their advantage. However, the lack of a comprehensive multimodal dataset, containing spectra from a variety of spectroscopic techniques, has limited machine-learning approaches mostly to single-modality tasks for predicting molecular structures from spectra. Here we introduce a dataset comprising simulated H-NMR, C-NMR, HSQC-NMR, Infrared, and Mass spectra (positive and negative ion modes) for 790k molecules extracted from chemical reactions in patent data. This dataset enables the development of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Advanced Chemical Sensor Technologies
