JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data

Apurva Kalia; Yan Zhou Chen; Dilip Krishnan; Soha Hassoun

arXiv:2411.14464·q-bio.QM·June 10, 2025

JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data

Apurva Kalia, Yan Zhou Chen, Dilip Krishnan, Soha Hassoun

PDF

TL;DR

JESTR introduces a joint embedding space technique that significantly improves the accuracy of molecular annotation in untargeted metabolomics by embedding spectra and molecules together and ranking candidates based on cosine similarity.

Contribution

The paper presents a novel joint embedding space approach for metabolite annotation, outperforming existing tools and pretrained models in accuracy.

Findings

01

JESTR outperforms other tools by 23.6%-71.6% in rank@1-5.

02

Regularization with candidate molecules boosts performance by 11.4%.

03

JESTR surpasses SIRIUS and CFM-ID by 31% and 238%, respectively.

Abstract

Motivation: A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low. Results: We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that explicitly construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6%-71.6%. We further demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFragmentation