Comprehensive Curation and Harmonization of Small-Molecule MS/MS Libraries in Spectraverse
Vishu Gupta, Hantao Qiang, Hsin-Hsiang Chung, Ehud Herbst, Michael A. Skinnider

TL;DR
Spectraverse is a new, comprehensive library of high-quality mass spectra for small molecules, designed to improve metabolite identification and machine learning in metabolomics.
Contribution
Spectraverse introduces a harmonized, curated MS/MS library addressing quality and metadata issues in public spectral databases.
Findings
Spectraverse includes spectra from major and overlooked repositories after extensive preprocessing.
The library identifies undocumented pitfalls in public libraries that may have affected machine learning model training.
Spectraverse offers the broadest coverage of chemical space and ionization modes for metabolomics to date.
Abstract
Reference libraries of tandem mass spectra (MS/MS) are widely used for metabolite identification in untargeted metabolomics and to train machine-learning models for metabolite annotation. However, public spectral libraries are scattered across disparate databases and contain spectra that are of low resolution or quality, missing critical metadata, or which have chemically incoherent annotations. Addressing these issues requires extensive preprocessing and considerable expertise in mass spectrometry, which presents a significant barrier to investigators interested in developing their own machine-learning models. Here, we present Spectraverse, a comprehensive and extensively curated library of public MS/MS spectra from small molecules. We assembled reference spectra from both major repositories and previously overlooked resources and then developed a preprocessing pipeline to harmonize…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Mass Spectrometry Techniques and Applications · Computational Drug Discovery Methods
