OctoChemDB: An Aggregated Database for Small Molecule Identification Using High-Resolution MS Data
Ricardo Silvestre, Rémi Martinent, Laure Menin, Natalia Gasilova, Vincent Mutel, Cyril Portmann, Luc Patiny

TL;DR
OctoChemDB is a centralized database that aggregates and harmonizes chemical, biological, and spectral data to improve small molecule identification using high-resolution mass spectrometry.
Contribution
OctoChemDB introduces a REST API and web interface for m/z-based searches and spectral analysis, integrating data from multiple open-access resources.
Findings
OctoChemDB successfully aggregates data from PubChem, MassBank, and GNPS into a unified database.
The platform enables accurate identification of compounds like MDMA and caffeine through spectral matching and fragmentation analysis.
The REST API and web interface streamline dereplication workflows and support integration into external tools.
Abstract
High-resolution mass spectrometry (HRMS) is a cornerstone technology to dereplicate small molecules by comparing their MS spectral data to references in extensive chemical databases. However, most existing chemical databases lack robust support for processing spectral data or enabling direct m/z-based searches, limiting their usefulness for rapid compound identification. To address this, we developed OctoChemDB, a centralized database that aggregates and harmonizes chemical, biological, and spectral data from multiple open-access resources such as PubChem, MassBank, and GNPS. To make this data programmatically accessible, we implemented a REpresentational State Transfer Application Program Interface (REST API) that allows external tools and software to query the database using customizable parameters. This API serves as the core access point for developers and researchers to integrate…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Mass Spectrometry Techniques and Applications · Computational Drug Discovery Methods
