Learning from All Views: A Multiview Contrastive Framework for Metabolite Annotation

Yan Zhou Chen; Soha Hassoun

PMC · DOI:10.1021/acs.analchem.5c05675·February 23, 2026

Learning from All Views: A Multiview Contrastive Framework for Metabolite Annotation

Yan Zhou Chen, Soha Hassoun

PDF

Open Access

TL;DR

This paper introduces MVP, a new method for improving metabolite annotation by combining multiple data views into a shared learning framework.

Contribution

MVP introduces a multiview contrastive learning framework that jointly learns from multiple data views to improve metabolite annotation.

Findings

01

MVP outperforms rank aggregation strategies using consensus spectra for metabolite annotation.

02

Using consensus spectra, MVP achieves 36.0% and 14.0% rank@1 for mass and formula-based retrieval.

03

MVP performs as well or better than existing methods when using individual spectra.

Abstract

Metabolomics, enabled by high-throughput mass spectrometry, promises to advance our understanding of cellular biochemistry and guide new discoveries in disease mechanisms, drug development, and personalized medicine. However, as the assignment of molecular structures to measured spectra is challenging, annotation rates remain low and hinder potential advancements. We present MultiView Projection (MVP), a novel framework for learning a joint embedding space between molecules and spectra by leveraging multiple data views: molecular graphs, molecular fingerprints, spectra, and consensus spectra. MVP builds on contrastive multiview learning to capture mutual information across views, leading to more robust and generalizable representations for spectral annotation. Unlike prior approaches that consider multiple views via concatenation or as targets of auxiliary tasks, MVP learns from all…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Gallus gallus(bantam · species)

Chemicals8

terpenoid H MVP amino acids Si shikimate Na alkaloid

Diseases1

MVP

Figures4

Click any figure to enlarge with its caption.

Three paradigms for spectra annotation: (A) inverse methods include de novo generation and fingerprint prediction from the query spectrum; (B) forward methods predict spectra from molecular candidates to match against the query spectrum; and (C) view alignment methods minimize the distance between a molecule and its spectral embedding.

Different views of molecular data (A) and spectral data (B). The views at the center represent the most widely used views for metabolite annotation: spectrum and graph. Other representations not shown include latent embeddings from pretrained models for both molecules and spectra.

Overview of MVP. (A) Four encoders are trained to place the four matching views close in the joint embedding space. (B) MVP employs the full-graph paradigm, where the overlap of any number of views denotes the number of view pairs considered during training. For example, considering all four views results in six view pairs used for training.

Case study results. (A) Total abundance of each pathway per sample type. (B) Putative metabolites identified by MVP. Each molecule is labeled with an ID, and its pathway and regulation. The IDs correspond to the labels in the volcano plots that show the fold change and p-value between CIC and CON features (C) and CAC and CON features (D).

Tables2

1. MassSpecGym Statistics on the Whole Dataset (A) and the [M + H]+ Subset (B)

	A. MassSpecGym			B. [M + H]⁺ subset
	train	val	test	train	val	test
spectra count	194,119	19,429	17,556	166,479	14,692	14,066
molecule count	25,046	3386	3170	24,100	3244	3044

2. Ranking Performance on MassSpecGym Compared to Other Tools

model	rank@1 (↑)	rank@5 (↑)	rank@20 (↑)	MCES@1 (↓)
A. candidates by mass
MIST	14.6 (13.8–15.5)	34.87 (33.7–36.1)	59.15 (57.9–60.4)	15.4 (15.1–15.3)
ESP	10.7 (10.0–11.5)	24.8 (23.8–25.9)	42.7 (41.5–43.9)	35.4 (34.9–35.8)
JESTR	15.1 (14.3–16.0)	36.8 (35.6–38.0)	60.3 (59.1–61.5)	20.7 (20.3–21.1)
JESTR_NR	15.6 (14.7–16.5)	37.5 (36.3–38.6)	60.6 (59.4–61.7)	20.0 (19.6–20.4)
MVP (mol-s)	26.4 (25.1–27.5)	58.9 (57.7–60.1)	86.9 (86.1–87.7)	12.3 (12.0–12.6)
MVP (mol-cs)	36.0 (34.8–37.2)	68.0 (66.8–69.1)	91.9 (91.2–92.6)	9.3 (9.1–9.6)
B. candidates by formula
MIST	9.6 (8.9–10.3)	22.1 (21.1–23.1)	41.1 (40.0–42.3)	12.8 (12.6–12.9)
ESP	11.1 (10.3–11.8)	27.4 (26.4–28.5)	52.2 (51.0–53.5)	16.4 (16.3–16.6)
JESTR	11.9 (11.1–12.7)	33.0 (31.8–34.1)	61.5 (60.2–62.7)	11.7 (11.5–11.9)
JESTR_NR	11.8 (11.1–12.7)	33.5 (32.3–34.7)	61.5 (60.2–62.7)	11.7 (11.5–11.9)
MVP (mol-s)	11.1 (10.4–11.9)	31.1 (30.0–32.3)	62.0 (60.8–63.2)	11.6 (11.4–11.8)
MVP (mol-cs)	14.0 (13.2–14.8)	36.9 (35.7–38.0)	68.1 (67.0–69.2)	10.4 (10.2–10.5)

Funding1

—National Institute of General Medical Sciences10.13039/100000057

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetabolomics and Mass Spectrometry Studies · Computational Drug Discovery Methods · Machine Learning in Bioinformatics

Full text

Introduction

Untargeted metabolomics enables high-throughput profiling of thousands of metabolites, providing insights into cellular metabolism, disease mechanisms, biomarkers, and environmental effects. A fundamental challenge is metabolite annotation, the assignment of chemical structures to LC-MS/MS spectra. These spectra capture patterns of ion fragments as mass-to-charge (m/z) ratios and their corresponding intensities. Annotation typically involves matching experimental spectra to reference libraries such as NIST,? GNPS,? or MoNA.? However, annotation rates remain low due to limited library coverage.?

Low annotation rates have promoted the development of machine-learning (ML) methods that rank candidate molecules by their likelihood of producing a query spectrum. Existing approaches either map between molecular and spectral modalities or learn a joint embedding space. Mapping approaches include “inverse” methods such as de novo generation ?−? ? ? and fingerprint prediction ?,? (FigureA), and “forward” methods that simulate spectra from molecules ?−? ? ? ? ? (FigureB). In contrast, alignment approaches ?,? learn a joint embedding between the two modalities where matching molecule–spectrum pairs are placed close together, and nonmatching pairs are positioned farther apart (FigureC). Alignment approaches, when implemented well, have shown to outperform mapping approaches, e.g., JESTR.?

Three paradigms for spectra annotation: (A) inverse methods include de novo generation and fingerprint prediction from the query spectrum; (B) forward methods predict spectra from molecular candidates to match against the query spectrum; and (C) view alignment methods minimize the distance between a molecule and its spectral embedding.

Current alignment models, however, explore only a subset of the available views for each modality (Figure). Molecules can be represented as SMILES?/SELFIES,? fingerprints,? learned embeddings from pretrained models, ?,? structure-based fragmentation trees,? chemical formulas, or 3-D structures. Spectra can likewise be described by spectral motifs,? formula-based fragmentation trees,? subformula labels,? learned embeddings from pretrained models, ?−? ? substructure labels, or consensus spectra. We hypothesize that combining multiple views enables the model to capture both complementary and shared information while mitigating view-specific artifacts, ultimately improving cross-modal alignment.

Different views of molecular data (A) and spectral data (B). The views at the center represent the most widely used views for metabolite annotation: spectrum and graph. Other representations not shown include latent embeddings from pretrained models for both molecules and spectra.

We present MultiView Projection (MVP), a contrastive framework that learns from multiple molecular and spectral views. Molecules are represented as graphs and fingerprints, while spectra are represented as both individual and consensus spectra (aggregated from multiple measurements). Consensus spectra, in particular, offer richer representations by aggregating measurements across collision energies, a common practice for improving spectrum quality. ?,? Prior studies have shown that using multiple spectra enhances annotation accuracy. ?,?,? Importantly, by maximizing mutual information across these four “training views”, MVP jointly aligns all views in a shared embedding space, enabling more robust candidate ranking. While some prior methods incorporate additional molecular or spectral views through concatenation or auxiliary prediction tasks (Section S1), MVP differs as it offers a systematic framework that unifies multiple views.

MVP employs four encoders, one per view, and optimizes a contrastive loss across all pairwise view combinations (Figure). At inference, it is possible to annotate data through one of many possible “ranking views”: graph-spectra (mol-s), fingerprint-spectra (fp-s), graph-consensus spectra (mol-cs), and fingerprint-consensus spectra (fp-cs). We evaluate two strategies for leveraging multiple spectra per molecule: an aggregate-then-rank strategy, which combines spectral information prior to candidate ranking, and a rank-then-aggregate strategy, which first ranks candidates for each individual spectrum and then aggregates the rankings. We train and evaluate MVP on the MassSpecGym benchmark,? comparing against forward, inverse, and alignment-based baselines, ESP,? MIST,? and JESTR,? respectively. Results show that MVP significantly improves candidate retrieval, particularly when leveraging consensus spectra. Ablation studies further validate the benefit of integrating all four modalities and employing the aggregate-then-rank strategy.

Overview of MVP. (A) Four encoders are trained to place the four matching views close in the joint embedding space. (B) MVP employs the full-graph paradigm, where the overlap of any number of views denotes the number of view pairs considered during training. For example, considering all four views results in six view pairs used for training.

Our contributions are as follows:

Presenting MVP, a novel framework for jointly learning the embedding space of molecules and spectra that leverages four data views: molecular graph, molecular fingerprint, spectrum, and consensus spectrum. MVP learns representations that maximize the mutual information between different views of the same datum. We utilize and evaluate the learned representations for ranking molecular candidates for a given query spectrum.
Achieving competitive results on the MassSpecGym retrieval task when candidates are provided by mass or by formula. In particular, for candidates by mass, MVP outperforms MIST by 147% and JESTR by 135% for rank@1. Further, our ablation studies demonstrate the value of jointly learning from additional spectral and molecular views beyond traditional spectral/graph views used in prior work.
MVP presents a systematic framework for effectively leveraging consensus spectra in molecular annotation. Our results consistently show that utilizing a consensus spectrum yields superior annotation results when compared to using a single spectrum. Further, we show that molecular annotation based on the consensus spectrum outperforms the aggregation of annotation results on individual spectra of the same molecule. That is, an aggregate-then-rank annotation paradigm is superior to a rank-then-aggregate paradigm.

Methods

Molecule and Spectra View Representations

MVP uses two molecule-based views: graphs and fingerprints. A graph encodes a molecular structure by treating atoms as nodes and bonds as edges. We use the same graph featurizer as JESTR,? where node features include atom type, atomic mass, valence, ring membership, formal charge, radical electrons, chirality, degree, number of hydrogens, and aromaticity, and edge features include the bond type, ring membership, conjugacity, and the stereo configuration. A molecule’s fingerprint is attained using binary Morgan fingerprint (nbit = 1024, r = 5), computed using RDKit.?

MVP uses two spectra-based views: individual spectra and the consensus spectra. As with MIST, spectral peaks are first annotated using MIST’s peak formula annotation code based on an enumeration strategy given the ground truth chemical formula, keeping the 60 most abundant peaks for a given spectrum while using a ±20 ppm threshold between a peak’s m/z and a formula. Further, each peak is encoded as a vector $[eqn]$ , where the first 14 entries represent the atom count of elements, E = {C, H, O, N, P, S, Cl, F, Br, I, B, As, Si, Se} and the last entry represents the normalized intensity. The atom counts are scaled by dividing by the highest observed count of the corresponding element in the data set. As the normalization depends on the characteristics of the training data, the model performance may not be optimal when a formula contains an atom that has not been seen or has a count value greater than that seen in the training data. Both spectra and consensus spectra are represented as a set of peaks, where each peak is defined using its formula and intensity. For training, validation, and test sets, the consensus spectra are created by merging all peaks from spectra that share the same molecular identity. If multiple peaks share the same formula, one formula is kept, and the maximum of the intensities is retained.

Encoders

MVP uses the same graph encoder as JESTR - a graph convolution network (GCN) of 3 layers followed by a pooling layer and an MLP with 2 fully connected layers to generate the final structural embeddings, Z mol (Figure S1). The fingerprints are encoded by a 3-layered MLP to generate the fingerprint embeddings, Z fp. Both spectra and consensus spectra use encoders of the same architecture (Figure S2). Peaks are first encoded by a 3-layered MLP, and then the set of peaks is fed into a transformer encoder consisting of 2-attention heads, outputting embeddings of spectra Z s and consensus spectra Z cs. The exact dimensions of the model architecture are described in Table S1. The model is trained with the Adam optimizer? for 1,500 epochs with a learning rate of 7.0 × 10^–05^, a batch size of 64, and a contrastive loss temperature of 0.05.

Contrastive Learning with Multiple Views

MVP contrasts four views of a datum: its molecular graph and fingerprint, spectrum, and consensus spectrum. The objective of contrastive learning is to place views of the same data point close in a shared embedding space while ensuring nonmatching views are far apart. A nonmatching set of views arises from a molecule and any nonmatching spectra. We consider pairwise mutual information between the four views: Z mol, Z fp, Z s, and Z cs. The total loss function is formulated as

[eqn]

Each pairwise contrastive loss is computed symmetrically:

[eqn]

Following the CMC framework, the contrastive loss for a given view pair is defined as

[eqn]

where $[eqn]$ is a batch of samples, k is the number of negative samples v q ^ j ^ for a given sample v p ^ i ^, and v p ^ i ^ and v q ^ i ^ correspond to a matching pair. The function h is a discriminatory function that measures the similarity between two embeddings using cosine similarity, scaled by a hyperparameter, τ. The hyperparameter adjusts how sharply or softly the model distinguishes between positive and negative samples, with smaller τ encouraging the model to push negative pairs further apart. The function h is defined as

[eqn]

Results

MVP is evaluated on the retrieval task, where given a spectrum and a set of molecular candidates, MVP ranks the molecules using the embedding similarity of a spectra-based view against a molecule-based view. Consistent with standard benchmarking metrics, we report rank@k, where k = 1, 5, 20 indicates the percentage of the test data for which the correct molecule is ranked in the top k position. When we compare MVP against other annotation tools, we also report the MCES@1, a metric that measures the maximum common edge subgraph (MCES) distance between the target molecule and the molecule ranked at 1. The best-performing model achieves the highest rank@k percentages and the lowest MCES@1 value.

Data Set

We evaluate MVP on the MassSpecGym data set, a recently established benchmark data set for machine learning models for annotation-related tasks. Two candidate sets are provided. Candidates by mass are molecules whose molecular mass is within 10 ppm of that of the target molecule, while candidates by formula are molecules with the exact chemical composition as the target. Each molecule is associated with a maximum of 256 molecular candidates for each candidate set. MassSpecGym contains spectra with [M + H]^+^ and [M

Na]^+^ adducts. Peak labels were obtained through MIST’s formula annotation algorithm, which assumes that every peak is charged with the adduct, an assumption that may not always hold. Under this framework, peaks are labeled without the adduct but can be merged across spectra with different adducts if they share the same molecular formula. This facilitates the construction of the consensus spectra, though it does not account for possible differences in ion response between adducts. To mitigate potential inaccuracies from these assumptions, we assess performance not only on the whole data set (TableA) but also on the [M + H]^+^ subset (TableB) in some of our evaluations. By design, the MassSpecGym data set is challenging as the spectra in the data set are split such that molecules across splits have an MCES distance of at least 10 from one another. Strong performance on the MassSpecGym indicates that a model exhibit strong generalization and robustness. The MassSpecGym benchmark provides a reproducible and fair comparison with existing state-of-the-art methods, which were previously trained on this benchmark set.

1: MassSpecGym Statistics on the Whole Dataset (A) and the [M + H]+ Subset (B)

Which Ranking View is Best?

Across all ranking views, the best performance is consistently attained using the mol-cs ranking view for both candidates by mass and candidates by formula (Table S2). The performance improvements across the ranks when using the consensus spectra (mol-cs vs mol-s, and fp-cs vs fp-s) characterize the significant impact of leveraging multiple fragmentation patterns within each consensus spectrum at inference. The results indicate that MVP is able to place the consensus spectrum, on average, closer to the molecular views compared to individual spectra. Using the graph view provides better ranking performance than the fingerprint view. That is, mol-s ranking view always outperforms fp-s, and mol-cs outperforms fp-cs. The graph view, therefore, likely shares more mutual information with a spectrum. For the rest of this paper, we focus further evaluation on ranking views that use the graph view, but highlight the importance of the fingerprint view in the ablation study on the various training views.

Comparison with Other Annotation Tools

We compare MVP against MIST, ESP, and JESTR (Table). Confidence intervals in this work are estimated using bootstrap resampling, and therefore directly reflect variability due to sampling of the test set. The width of the bootstrap confidence intervals provides a natural bound on the meaningful numerical precision of reported performance metrics. Accordingly, we report metrics using a single decimal, commensurate with the observed bootstrap variability. Confidence intervals narrow at rank@20, reflecting reduced sensitivity to individual ranking perturbations and greater stability of the metric across bootstrap resamples. We also provide partial results for the MassSpecGym data set on SIRIUS? (Section S3.2), as its implementation is not publicly available and it cannot be pretrained on the same data set. MVP with the mol-cs ranking view outperforms other tools across all metrics for both candidates by mass and formula. Compared to MIST, MVP with the mol-cs ranking view improves rank@1 by 147% (candidates by mass) and 46% (candidates by formula), and MCES@1 by 40% (mass) and 19% (formula). Compared to JESTR and JESTR_NR_, MVP achieves average gains of 135% (mass) and 18% (formula) in rank@1, and 54% (mass) and 11% (formula) in MCES@1. Using the mol-s ranking view, MVP outperforms other works except for rank@1 and rank@5 compared to JESTR when ranking candidates by formula. It is possible that MVP’s use of peak formulas as input to the spectral encoders results in poor differentiation of molecular candidates, possibly explaining MVP’s worse performance than JESTR, which does not utilize peak formulas. We further explore this issue in the comparative study on spectral representation. Nevertheless, the MCES@1 for MVP (mol-s) is smaller than that of JESTR, suggesting that, in general, candidates ranked in the first position by MVP are more similar to the target structures than those ranked by JESTR. Overall, the improvement is more significant in ranking candidates by mass than in ranking candidates by formula, likely because the formula-annotated peaks help discern candidates of the same mass but different formulas.

2: Ranking Performance on MassSpecGym Compared to Other Tools

Aggregate-Then-Rank Strategy Using the Consensus Spectrum or

Rank-Then-Aggregate Strategy?

Given multiple query spectra measurements for the same molecule and a set of candidate molecules, MVP provides the opportunity to first create a consensus spectrum and then rank the candidates using the mol-cs ranking view. MVP therefore implements an “aggregate-then-rank” paradigm. As an alternative, it is possible to rank the candidates for each spectrum and then aggregate the ranking results. Within this “rank-then-aggregate paradigm”, MVP provides a ranking for each candidate molecule for each query spectrum using the mol-s ranking view. As such a strategy suggests different rankings on the candidates for each query spectrum, we evaluate two rank aggregation methods. The average rank method averages the rank of each molecular candidate across all spectral queries. The ranking of each candidate is based on its mean rank, with lower average ranks indicating a better rank. This simple and intuitive method equally weights all spectral queries. The reciprocal rank method assigns a score by summing the reciprocal of the ranking position (1/rank) for each candidate. A candidate with a higher score ranks better. This method prioritizes highly ranked candidates more strongly, making it more sensitive to cases where a candidate is ranked very well in some queries but poorly in others. An illustration of both paradigms is shown in Figure S3.

We compare the two rank-then-aggregate methods against MVP’s mol-cs ranking. As the number of spectra per target molecule varies in the test set, we limit the evaluation to a subset of the test set for which a test molecule has exactly three spectra, since the majority of the molecules have three spectra. There are 1531 such cases (Figure S4). The rank-then-aggregate methods underperform the aggregate-then-rank method (Figure S5A,B). MVP therefore provides an effective method for exploiting the additional coverage provided by spectra measured at different collision energies and/or detected with different adducts.

Like MVP, CFM-ID? leverages multiple spectra for annotation. Users can input up to three spectra corresponding to low, medium, and high collision energies, which are then matched against simulated spectra. Candidate molecules are ranked based on the average score across the three simulated spectra, following a rank-then-aggregate approach. We benchmark against the publicly available CFM-ID 4.0 pretrained model using a merged spectra data set derived from the MassSpecGym data set, and ranking the candidates by formula. For each test molecule, we randomly sample three spectra (one for each collision energy level) and rank the candidates accordingly. To ensure a fair comparison, we exclude spectra whose molecules appear in CFM-ID’s training set. CFM-ID achieves a 2.0% rank@1 performance (60/3004 consensus spectra), whereas MVP achieves 13.9% rank@1 (418/3004), further demonstrating the advantages of the aggregate-then-rank strategy. When leveraging all available spectra for each molecule, MVP further improves to 14.4% rank@1 (432/3004), emphasizing the value of incorporating all relevant information. The effectiveness of using the consensus spectrum depends on the availability of multiple spectra collected at various collision energies. Therefore, molecules with limited collision energy coverage may not benefit from using the consensus spectra view. MVP addresses this limitation with a flexible design that leverages available data to improve annotation accuracy.

Comparative Study on Spectra Representation

As JESTR, which uses binned spectral representation without formulas, outperformed MVP on candidates by formula for ranks 1–5, we compare peak formulas representation of the spectra with a binned representation. We denote the MVP spectra representation as FormSpec, whereas the binned version is referred to as BinnedSpec. The MVP architecture setup remains the same, except that the spectra and consensus spectra encoders are replaced with 3-layered MLPs, similar to the spectra encoder architecture used in JESTR (Figure S2). We compare MVP when trained on MassSpecGym and the [M + H]^+^ subset. Overall, using FormSpec provides improved results over the BinnedSpec across both data sets and all metrics, except for rank@1 for both mol-cs and mol-s ranking views and rank@6–8 for the mol-cs ranking view with formula-based candidates when trained on the whole data set (Table S3 and staircase plots in Figure S6A,B).

For candidates by mass, when using both mol-s and mol-cs ranking views, there is a remarkable improvement across all ranks using the formula representation. The mol-s ranking view achieves 26.4% rank@1 with FormSpec vs 12.3% rank@1 with BinnedSpec, a 115% improvement. The mol-cs ranking view achieves 36.0% rank@1 with FormSpec vs 16.6% with BinnedSpec, a 116% improvement when evaluated on the whole data set. These results indicate that MVP’s spectral encoders can clearly differentiate candidates from the target molecule. When analyzing the distribution of cosine similarities between the query spectrum and its target molecules against the average similarity of the query spectrum and the corresponding candidate molecules, MVP’s formula-based spectral encoders better differentiate the target molecule from the candidates when compared to using binned-based spectral encoders (Figure S7A vs B). The results when using the [M + H]^+^ subset are consistent with those obtained using all data points for the MassSpecGym.

For candidates by formula, training MVP using binned spectra outperforms the use of formula-based spectra for rank@1, but the FormSpec model outperforms the BinnedSpec at higher ranks (Figure S6B). When examining the distribution of the cosine similarities between query spectrum and target molecule and the average cosine similarity between query spectrum and candidate molecules, the differentiation between target and candidate is not as pronounced as was the case for candidates by mass (Figure S7C). The distribution of the candidate-spectra similarity shifts to the right, compared to the case when using FormSpec, along with the target-spectra similarity. This reflects the difficulty in distinguishing candidates from their relevant targets when they share the same chemical formulas. The right shift is more pronounced for FormSpec compared to BinnedSpec (Figure S7C vs D), indicating that the spectra and molecule encoders are learning associations related to the overall chemical composition of the molecule rather than the structural relation between spectra and molecules. However, when utilizing the [M + H]^+^ subset, FormSpec outperforms BinnedSpec, indicating that potentially inaccurate peak labels for the [M + Na]^+^ spectra may contribute to the poorer performance when using FormSpec at rank@1 when training on the MassSpecGym.

As there are multiple spectra per molecule in the test set, we examine the correlation of the two spectral representations with rank variability. For all test molecules with 2, 3, or 4 spectra, we compute the normalized rank difference, defined as the range of the ranks divided by the number of candidates (Figure S7E,F). For both candidates by mass and formula, we observe significantly smaller rank variability when using FormSpec, therefore indicating that the use of formulas enables the spectra encoder to learn a more uniform spectra representation for the same molecule.

Are All Four Views Needed? An Ablation Study on Training Views

We study the contribution of each view by eliminating one or more views when training the model, resulting in four models, trained on mol-fp-s-cs, mol-s-cs, mol-fp-s, or mol-s training views (Table S4A). There are no clear patterns of which of the four models consistently outperforms the others across all ranks. For example, when ranking candidates by mass using the mol-cs view, the mol-fp-s-cs and the mol-s-cs models are competitive across the ranks (Figure S8A). Similar competitive patterns are observed with the four models using the mol-s ranking view. When ranking candidates by formula, competitive patterns are also observed across the ranks (Figure S8B). These observations suggest that additional views do not consistently improve performance across the ranks.

However, when training only on the [M + H]^+^ subset of the MassSpecGym data (Table S4B), we observe more consistent patterns across the ranks. The mol-fp-s-cs model trained on all four views outperforms models trained on fewer views, demonstrating the value of additional views. Furthermore, removing the fingerprint view consistently reduces performance. For example, when ranking candidates by mass with the mol-s views, the rank@1 performance drops from 29.7 to 25.6% when the fingerprint view is removed from the mol-fp-s-cs model. Similarly, the rank@1 drops from 29.2 to 27.8% when the fingerprint view is removed from the mol-fp-s model. Similar drops in performance due to removing the fingerprint view are observed when ranking candidates by formula, indicating the fingerprint view offers nuances critical for ranking. Finally, considering that the performance of the models when trained on the [M + H]^+^ subset surpasses that of models trained on the whole data set, training specialized models or finetuning a general model for each adduct may be beneficial.

MVP Reveals Differential Metabolites in Cultured and Conventional

Meat

To illustrate the applicability of MVP within metabolomics workflows, we apply it to data from a recent study comparing the metabolomic profiles of traditional chicken meat (CON), muscle satellite cells (CIC), and myotube-formed cells (CAC).? This investigation is particularly important for evaluating the nutritional value and safety of cultured meat, an emerging alternative to conventional animal products. LC–MS/MS data in mzML format are publicly available on the Metabolomics Workbench.? The original study identified 321 features corresponding to 214 unique molecules.

For this evaluation, we focus on the positive ion mode data. Since neither the processed spectra nor the mapping to the identified metabolites was provided, we replicated the study using a standard metabolomics workflow and report the hit rate. Feature extraction and representative spectra selection are performed using MZmine3,? followed by formula and adduct assignment with SIRIUS v6.1.0.? Candidate structures are retrieved within a 0.01 Da tolerance from FoodDB,? HMDB,? MassSpecGym,? NIST23,? and the GNPS Multiplex library.? Further, recognizing that MVP relies on the correct molecule being present in the candidate pool and accounting for potential adduct misassignments, we include all reported compounds in the candidate set. To improve confidence in annotations by MVP, we focus on results with a molecule–spectra score greater than 0.5 and a margin of at least 0.01 between the top- and second-ranked candidate scores.

Using this workflow, 900 features are detected with MZmine3, of which 866 receive formula and adduct assignments from SIRIUS. MVP subsequently annotates 640 spectra corresponding to 146 unique molecules that met the defined criteria (Figure S9A,B). Metabolite classification using NPClassifier? demonstrates that MVP annotations encompass all pathways reported in the original study, with consistent enrichment patterns (FiguresA and S9C). In particular, amino acids and peptides are significantly more abundant in conventional meat compared to cultured meat, while other metabolic pathways showed comparable abundances.

Case study results. (A) Total abundance of each pathway per sample type. (B) Putative metabolites identified by MVP. Each molecule is labeled with an ID, and its pathway and regulation. The IDs correspond to the labels in the volcano plots that show the fold change and p-value between CIC and CON features (C) and CAC and CON features (D).

Of the 146 metabolites identified by MVP, 73 are newly annotated, highlighting the model’s ability to expand metabolite coverage beyond existing workflows. Several metabolites (FigureB) that are associated with the terpenoid, alkaloid, and shikimate and phenylpropanoid pathways consistently showed differences in metabolite abundance in both CIC and CAC relative to CON (FigureC,D). Four molecules (#7, 8, 24, and 41) are consistently elevated, and two (#9 and 60) are reduced. The reported annotations are putative and require experimental validation.

Conclusions

We presented MVP, a novel framework for learning a joint spectral/molecular embedding space using multiple data modalities: molecular graphs, fingerprints, spectra, and consensus spectra. We demonstrated the effectiveness of this framework in ranking molecular candidates for spectral annotation. While prior work incorporated these modalities as multitasks or via concatenation, MVP jointly learns from the mutual information among the views. Another contribution of this work is that MVP is the first annotation tool designed to enable annotation based on the consensus spectra. While each individual constituent spectrum is a subset of the consensus spectrum, the consensus spectrum view serves as an anchor for different spectra of the same molecule, promoting the model to learn uniform spectra representation regardless of instrument conditions when they share the same molecule. Comparing annotation using the consensus spectrum against ranking-then-aggregating ranking results for its constituent spectra, we showed that the aggregate-then-rank paradigm produces superior performance. We also showed that annotation using the mol-cs view outperforms all other views. On the MassSpecGym benchmarking data set, MVP’s mol-cs ranking view achieves 36.0% rank@1, a 147 and 135% improvement over MIST and JESTR, respectively, for candidates by mass. For candidates by formula, MVP achieves 14.0% rank@1, a 46 and 18% improvement over MIST and JESTR, respectively. The ability to rank candidate molecules based on the query consensus spectrum will be valuable for practitioners who often extract several measurements for the same metabolite. With MVP, there is no need to decide how to aggregate and interpret the rankings for the various individual constituent spectra. Further, our comparative and ablation studies show that including peak formulas is advantageous for ranking candidates by mass, and that learning from all four views provides improved performance compared to fewer views. Lastly, we demonstrate a practical integration of MVP with existing metabolomics workflow on a study that compares metabolic profiles between cultured and conventional meat. With MVP annotations, we observe reduced amino acids and peptide levels in cultured meat, consistent with the original findings. Additionally, 73 putative metabolites were uncovered, with six consistently increased or reduced in cultured meat samples. Overall, MVP demonstrates the potential of using additional information beyond a spectrum and its corresponding molecular structure to improve representation quality for molecule-spectra matching. While MVP achieves strong performance, there are several directions for future improvement. Representation quality could potentially benefit from more accurate peak labeling and additional views. As ranking is a separate task from learning representations, including losses that prioritize ranking molecular candidates could improve ranking performance. Finally, MVP can be extended to related tasks such as ranking scaffolds and identifying analogs, further demonstrating its versatility and impact.

Supplementary Material

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1NIST, NIST Tandem Mass Spectral Library 2023 Release. 2023, https://www.nist.gov/programs-projects/tandem-mass-spectral-library/ (accessed 2025-03-26).
2Wang M.Carver J. J.Phelan V. V.Sanchez L. M.Garg N.Peng Y.Nguyen D. D.Watrous J.Kapono C. A.Luzzatto-Knaan T.Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking Nat. Biotechnol.20163482883710.1038/nbt.359727504778 PMC 5321674 · doi ↗ · pubmed ↗
3Mo NA, Mass Bank of North America. 2024, https://mona.fiehnlab.ucdavis.edu/ (accessed on 2025–03–26)
4Bittremieux W.Wang M.Dorrestein P. C.The critical role that spectral libraries play in capturing the metabolomics community knowledge Metabolomics 2022189410.1007/s 11306-022-01947-y 36409434 PMC 10284100 · doi ↗ · pubmed ↗
5Stravs M. A.Dührkop K.Böcker S.Zamboni N.MS Novelist: de novo structure generation from mass spectra Nat. Methods 20221986587010.1038/s 41592-022-01486-335637304 PMC 9262714 · doi ↗ · pubmed ↗
6Litsa E. E.Chenthamarakshan V.Das P.Kavraki L. E.An end-to-end deep learning framework for translating mass spectra to de-novo molecules Commun. Chem.2023613210.1038/s 42004-023-00932-337353554 PMC 10290119 · doi ↗ · pubmed ↗
7Wang, Y. ; Chen, X. ; Liu, L. ; Hassoun, S. MADGEN–Mass-Spec attends to De Novo Molecular generation. ar Xiv preprint ar Xiv:2501.01950 2025, 10.48550/ar Xiv.2501.01950. · doi ↗
8Bohde, M. ; Manjrekar, M. ; Wang, R. ; Ji, S. ; Coley, C. W. Diff MS: Diffusion Generation of Molecules Conditioned on Mass Spectra. ar Xiv preprint ar Xiv:2502.09571 2025, 10.48550/ar Xiv.2502.09571. · doi ↗