Small molecule retrieval from tandem mass spectrometry: what are we optimizing for?

Gaetan De Waele; Marek Wydmuch; Krzysztof Dembczy\'nski; Wojciech Kot{\l}owski; Willem Waegeman

arXiv:2602.16507·cs.LG·February 19, 2026

Small molecule retrieval from tandem mass spectrometry: what are we optimizing for?

Gaetan De Waele, Marek Wydmuch, Krzysztof Dembczy\'nski, Wojciech Kot{\l}owski, Willem Waegeman

PDF

Open Access 3 Reviews

TL;DR

This paper analyzes how different loss functions in deep learning models for LC-MS/MS compound identification affect the trade-off between fingerprint accuracy and molecular retrieval success, providing theoretical insights and practical guidance.

Contribution

It introduces a theoretical framework with regret bounds to understand the impact of loss functions on retrieval performance in mass spectrometry analysis.

Findings

01

Identifies a fundamental trade-off between fingerprint similarity and molecular retrieval.

02

Derives regret bounds that characterize when Bayes-optimal decisions diverge.

03

Provides guidance on loss function and fingerprint selection based on candidate set similarity.

Abstract

One of the central challenges in the computational analysis of liquid chromatography-tandem mass spectrometry (LC-MS/MS) data is to identify the compounds underlying the output spectra. In recent years, this problem is increasingly tackled using deep learning methods. A common strategy involves predicting a molecular fingerprint vector from an input mass spectrum, which is then used to search for matches in a chemical compound database. While various loss functions are employed in training these predictive models, their impact on model performance remains poorly understood. In this study, we investigate commonly used loss functions, deriving novel regret bounds that characterize when Bayes-optimal decisions for these objectives must diverge. Our results reveal a fundamental trade-off between the two objectives of (1) fingerprint similarity and (2) molecular retrieval. Optimizing for…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 2

Strengths

1. Clearly identifies and explains a practically important but previously overlooked trade-off in molecular retrieval. 2. Provides a unified theoretical framework that aligns well with empirical results. 3. Uses a standardized benchmark with appropriate splitting to ensure fair evaluation. 4. Systematic comparison across a wide range of commonly used loss functions.

Weaknesses

1. Choice of Fingerprint Representation May Limit Generality The study uses only Morgan fingerprints (radius=2, 4096 bits) as the molecular representation throughout all experiments. While this is a widely used fingerprint type, I am not sure whether the observed trade-off between fingerprint similarity and retrieval performance would still hold for other representations, such as MACCS keys, ECFP with different radii, topological torsions, or even learned neural fingerprints. Since different fi

Reviewer 02Rating 4Confidence 3

Strengths

- The paper is well written and easy to follow. - Standard datasets are used for analysis.

Weaknesses

- The experimental design appears to be flawed. The paper currently demonstrates (e.g., Fig. 1) that using supervised losses for fingerprint prediction (e.g., BCE) results in better fingerprint prediction, while using supervised losses for retrieval (e.g., contrastive loss) results in better retrieval performance. This is an expected outcome and does not adequately support the claim that there is a “fundamental trade-off between the two objectives.” This claim could be better supported by, for e

Reviewer 03Rating 4Confidence 4

Strengths

The paper points out some key flaws in the algorithmic approach taken by prior work and offers some interesting observations, both empirical and theoretical, for how to navigate them. The massspec problem is challenging and has broad applications if new methods are accurate at it.

Weaknesses

It was hard for me to understand what the central contribution of this paper was. It targets an interesting application, but does not provide a new modeling technique or achieve SOTA performance on the application. It provides an interesting observation about the impact of different loss functions on eval metrics. Then, it does some theoretical work to show why this dependence on the loss function may exist. If reviewing as an applied paper, the novelty seems limited because the task and models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Mass Spectrometry Techniques and Applications · Forensic Fingerprint Detection Methods