# OctoChemDB: An Aggregated Database for Small Molecule Identification Using High-Resolution MS Data

**Authors:** Ricardo Silvestre, Rémi Martinent, Laure Menin, Natalia Gasilova, Vincent Mutel, Cyril Portmann, Luc Patiny

PMC · DOI: 10.1021/acs.analchem.5c06761 · 2026-02-16

## TL;DR

OctoChemDB is a centralized database that aggregates and harmonizes chemical, biological, and spectral data to improve small molecule identification using high-resolution mass spectrometry.

## Contribution

OctoChemDB introduces a REST API and web interface for m/z-based searches and spectral analysis, integrating data from multiple open-access resources.

## Key findings

- OctoChemDB successfully aggregates data from PubChem, MassBank, and GNPS into a unified database.
- The platform enables accurate identification of compounds like MDMA and caffeine through spectral matching and fragmentation analysis.
- The REST API and web interface streamline dereplication workflows and support integration into external tools.

## Abstract

High-resolution mass
spectrometry (HRMS) is a cornerstone technology
to dereplicate small molecules by comparing their MS spectral data
to references in extensive chemical databases. However, most existing
chemical databases lack robust support for processing spectral data
or enabling direct m/z-based searches,
limiting their usefulness for rapid compound identification. To address
this, we developed OctoChemDB, a centralized database that aggregates
and harmonizes chemical, biological, and spectral data from multiple
open-access resources such as PubChem, MassBank, and GNPS. To make
this data programmatically accessible, we implemented a REpresentational
State Transfer Application Program Interface (REST API) that allows
external tools and software to query the database using customizable
parameters. This API serves as the core access point for developers
and researchers to integrate OctoChemDB data into their own workflows
and applications. As a practical demonstration of how the API can
be used, we built a web application, available at https://octochemdb.cheminfo.org/, that enables users to perform m/z-based searches, predict molecular formulas, assess isotopic similarity,
analyze fragmentation patterns, and retrieve associated literature
and patents. This web interface serves as a user-friendly example
of how the underlying database and API can be leveraged to accelerate
small molecule identification. We illustrate the utility of the platform
through case studies, including the identification of 3,4-methylenedioxymethamphetamine
(MDMA) and caffeine, demonstrating its effectiveness in proposing
structural hypotheses, matching experimental spectra with database
entries, and streamlining dereplication workflows. The entire project,
including source code, is available at https://github.com/cheminfo/octochemdb.

## Linked entities

- **Chemicals:** 3,4-methylenedioxymethamphetamine (PubChem CID 1615), MDMA (PubChem CID 1615), caffeine (PubChem CID 2519)

## Full-text entities

- **Genes:** DIH1 (diaphragmatic hernia 1) [NCBI Gene 1732] {aka HCD}
- **Chemicals:** norepinephrine (MESH:D009638), H2O (MESH:D014867), Acetonitrile (MESH:C032159), methanol (MESH:D000432), serotonin (MESH:D012701), dopamine (MESH:D004298), 1,3-Benzodioxole (MESH:C040539), 3,4-Methylenedioxymethamphetamine (MESH:D018817), Caffeine (MESH:D002110), B0-3 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12961639/full.md

---
Source: https://tomesphere.com/paper/PMC12961639