MS1FA: Shiny app for the annotation of redundant features in untargeted metabolomics datasets
Ruibing Shi, Frank Klawonn, Mark Brönstrup, Raimo Franke

TL;DR
MS1FA is a web-based tool that helps scientists annotate redundant features in metabolomics data using multiple approaches in one interactive platform.
Contribution
MS1FA is the first web-based platform integrating multiple annotation methods for redundant features in metabolomics.
Findings
MS1FA combines correlation-based grouping with MS2 data for reliable annotation of redundant features.
The platform supports annotation using MS1 data only, MS2 data only, or both.
MS1FA provides interactive exploration and download options for annotated feature tables.
Abstract
Untargeted metabolomics, the comprehensive analysis of small molecules in biological systems, has become an invaluable tool for understanding physiology and metabolism. However, the annotation of metabolomic data is often confounded by the presence of redundant features, which can arise from e.g. multimerization, in-source fragments (ISFs), and adducts. MS1FA uniquely integrates all major annotation approaches for redundant features within a single interactive platform. It combines correlation-based grouping with reliable ISF annotation using MS2 data and operates with MS1 data only, MS2 data only, or both. Additionally, it offers a distinctive method for grouping features based on relational criteria. As the only web-based platform with these capabilities, MS1FA provides easy access and allows users to explore and annotate the feature table interactively, with options to download the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1- —Lower-Saxony Ministry of Science and Culture within the doctoral program
- —EU Horizon
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Advanced Proteomics Techniques and Applications · Microbial Metabolic Engineering and Bioproduction
1 Introduction
Untargeted metabolomics aims to identify and quantify all metabolites in a biological sample. Liquid chromatography coupled with electrospray ionization (ESI) mass spectrometry is the most common analytical method used for this purpose. It offers versatile and sensitive detection of metabolites, but still poses the challenge to annotate tens of thousands of features and to identify metabolites. The majority of peaks in untargeted LC-MS based metabolomics datasets correspond to isotopes, redundant ions caused by ionization-related phenomena, such as adduct formation, multimerization and in source-fragmentation (ISF), or contaminants and MS or bioinformatics artifacts. A correct and comprehensive annotation of redundant features is crucial for accurate and reliable metabolite identification. Simply removing isotopes, adducts, and in-source fragments (ISFs) does not inherently resolve this issue. Instead, these features contain essential structural and ionization information that, when systematically integrated, improve annotation accuracy rather than hinder it. Clark et al. demonstrated that depending on the mass spectrometer used, differences in fragmentation, charge states and adduct formation can cause substantial differences and only 25% overlap between the feature tables of the same samples (Clark et al. 2021). Xu et al. found a high number of cases where in-source fragments mimic cellular metabolites which can result in misannotation and misidentification, potentially leading to wrong conclusions (Xu et al. 2015).
Various tools have been developed to aid feature annotation and address the problem of degenerate features. CAMERA, one of the earliest available tools, uses peak shape correlation to group features and annotate isotopes, adducts and typical neutral losses (Kuhl et al. 2012). A similar approach is followed by the R package CliqueMS (Senan et al. 2019). Ion identity molecular networking (IIMN) developed in the Dorrestein group integrates peak shape correlation analysis into molecular networking analysis (Schmid et al. 2021). NetID uses a global network optimization to annotate peaks based on biochemical transformations (Chen et al. 2021). ISFrag uses MS2 data to identify in-source fragments in LC-MS full scan data (Guo et al. 2021). However, it is still a major challenge to recognize all the redundant features and correctly annotate them.
Here, we introduce the novel tool MS1FA (MS1 feature annotation), which we designed to combine multiple approaches within one platform and offer the possibility to work with only MS1 data, with MS1 data combined with a pool sample measured in MS2 mode, or only MS2 files.
2 Materials and methods
MS1FA was developed in R and implemented as Shiny app, with essential parts written in C++ using the RCPP framework for fast processing and computation. It runs on a Shiny server which is freely accessible via https://ms1fa.helmholtz-hzi.de. The source code is deposited as Github archive: https://github.com/RuibingS/MS1FA_RShiny_dashboard. Figure 1 shows the workflow and processing steps of MS1FA which are discussed in the following paragraph.
The MS1FA workflow, consisting of three main steps: file upload (left panel), MS1FA algorithm (center panel), and output generation, including the annotated feature table and plots (right panel). Created with BioRender.com.
2.1 MS1FA inputs
A feature table generated by a peak picking algorithm such as XCMS or MZmine has to be uploaded (Fig. 1, left). To make use of the ISF annotation, MS2 data have to be provided, which can be either in the form of a single mzXML or mzML file from a pool sample or a mgf-file which can be generated from multiple files by MZmine. For metabolite annotation, a target list needs to be uploaded. MS1FA includes a predefined neutral loss table compiled by Oliver Fiehn’s lab (Ma et al. 2014) and a default ESI-MS adduct table assembled from published lists (Kuhl et al. 2012, Ma et al. 2014), but the users can also upload their own neutral loss and adduct annotation tables.
2.2 MS1FA algorithm
The steps of the MS1FA algorithm are depicted in Fig. 1, center:
The target list, containing the names and molecular formulas of metabolites to be annotated, is parsed and primary ions are calculated from the molecular formulas. If there is no target list available, the workflow can still be executed without one.The calculated primary ions are matched against the uploaded feature table by exact mass (m/z) and by retention time (rt) if available. The default m/z tolerance is 0.002 or 5 ppm and the default rt window is 20 s.Adduct annotation is carried out by calculating the adduct m/z values derived from the primary ion identified in step 2 according to the default common ESI adduct table. Next, adduct features are annotated by matching these m/z values to those in the feature table. The default m/z tolerance is 0.005 or 5 ppm and the default rt window is 2 s.In-source fragment (ISF) annotation is achieved by using MS2 data, ideally from data-dependent acquisition (DDA) runs (Supplementary Fig. S2). If DDA MS2 data from a pool sample are uploaded, the precursor ion (PI) m/z values are first aligned to the feature table m/z values (derived from MS1 spectra) using a default m/z tolerance of 0.002 or 5 ppm as threshold along within a default rt window of 20 s. When a PI is matched to a feature in the MS1 feature table, the corresponding MS2 spectrum m/z values are matched to MS1 features within a default m/z tolerance of 0.005 or 10 ppm.Annotation of isotope peaks and charge states: A pairwise m/z difference matrix is constructed within a default rt window of 3 s. ^13^C isotopes are detected by identifying m/z differences equal to 1.0034 and multiples (up to 4). For the first isotope, m/z tolerance of 0.002 is set and for the second and higher isotopes threshold 0.01 is used to also capture peaks dominated by heavier isotopes of S, Cl, or Br. To annotate multiply charged ions, m/z differences of 1.0034, 0.5017, and 0.3345 are detected to assign singly, doubly, and triply charged ions respectively.Neutral loss annotation: MS1FA calculates the m/z differences for each feature pair in the feature table and compares these against the neutral loss table. Only if a match occurs within a default rt window of 2 s and m/z tolerance within 0.002 m/z or 5 ppm, a neutral loss gets annotated.Grouping of features and group index assignment: To identify features derived from the same metabolite or otherwise related features, we categorize them into groups by using two grouping approaches. The first grouping method merges all features that are related to each other, such as in-source fragments, adducts, neutral losses, and isotopes, into one group (Supplementary Information S2.3). The second grouping method is based on perturbation profile similarity (Supplementary Information S2.1 and Supplementary Fig. S1). The basic idea is that if a metabolite has an x-fold altered intensity in condition A versus condition B, all features corresponding to this metabolite should have an x-fold altered intensity. A Pearson correlation matrix is calculated using log_10_ transformed intensity values with an rt threshold of 2 s and a correlation coefficient threshold of 0.8. A fully connected correlation network plot is constructed, ensuring that any pair of features within “corgroup” has a correlation value exceeding the threshold.
The user can adjust all thresholds and parameters mentioned above, select the desired correlation method (Pearson, Spearman, or Kendall) and interactively inspect the correlation test statistics for any pair of rows.
2.3 MS1FA outputs
The output of MS1FA (Fig. 1, right) includes an interactive feature table with all annotations, an interactive correlation network plot generated from the perturbation profile similarity approach, and box plots showing the intensity distribution for the correlated features. Selecting a row in the feature table automatically generates a network plot for the selected feature, where nodes represent feature names, m/z values, retention times, and annotations, while edges depict the correlation coefficients, m/z differences and neutral loss annotation if available. Box plots are then generated for these correlated features, illustrating the log_10_ intensity values across the selected samples. The feature table can be filtered in various ways and users can add their own annotations.
3 Results
The application of MS1FA was demonstrated using two LC-MS untargeted metabolomics datasets, both provided as demo data within the Shiny application and detailed in the user manual in Supplementary Information S4. The dataset “PA14” was generated by an untargeted metabolomics study of the bacterial pathogen Pseudomonas aeruginosa PA14 perturbed by different antibiotics (Franke et al. 2021). In an approach we coined “annotation by perturbation,” we generated differential abundances of PA14 metabolites. We use the correlation of feature abundance patterns to group features together in a so-called “corgroup” that should ideally contain only features from the same metabolite. Even without the availability of MS2 data, a deep annotation of features to metabolites of the target list is possible. Supplementary Fig. S3 shows the output for the metabolite phenylalanine with correct grouping of all features by correlation.
We established a second grouping approach, which uses any relationship between features (ISF-MS2 match, adduct, neutral loss, etc.) to link all related features. For instance for the natural product phlorizin, a glucoside of phloretin, MS1FA grouped twenty-seven features via ISF-MS2 relations, adduct-, neutral loss-, and isotope annotation. All features were correctly annotated, and the annotation of the neutral loss as anhydrohexose even provided structural information (Supplementary Fig. S4). This makes MS1FA especially useful for microbial metabolomics and natural product research, where reference MS2 data for secondary metabolites are often unavailable.
To evaluate the performance in terms of feature annotation and correct feature grouping of MS1FA, we generated a second dataset by spiking sixteen natural products into a complex P. aeruginosa metabolome extract (StM16). These compounds were selected because they tend to produce multiple MS1 features. To establish a reliable ground truth, we acquired reference spectra of the 16 standards across the full mass range, then cleaned and subtracted background signals, thereby defining the set of features derived from each individual compound. The MS1 signals of the reference spectra provided a reference against which the performance of the feature grouping of MS1FA in comparison to CAMERA and MZmine4 was evaluated. By this means False Discovery Rate (FDR) and True Positive Rate (TPR) and the respective confusion matrices could be calculated (Supplementary Tables S4 and S5 and Supplementary Figs S5–S7). MS1FA achieved the lowest FDR of 0.031 for feature grouping by intersecting “group” and “corgroup” assignments. Utilizing the union of the grouping method increased the TPR to 0.947 (at the cost of a higher FDR). In comparison, CAMERA and MZmine4 exhibited higher FDR of 0.155 and 0.152, respectively (Supplementary Table S5). Feature annotation was assessed by the number of correctly identified adducts, neutral losses, isotopes, and in-source fragments, as detailed in Supplementary Tables S7 and S8. Overall, MS1FA was able to annotate more features correctly compared to CAMERA, MZmine4 and ISFrag (Supplementary Tables S1 and S2).
In summary, MS1FA provides some unique capabilities such as feature grouping by any relation (Supplementary Table S6). In addition, it offers a unique combination of functionalities as well as the ease of use and interactivity of a web-based platform. The dual group indices assist users in refining their interpretation of annotation results and in double-checking their accuracy, leading to more confident metabolite identification. Users should be aware that false-positive matches can still lead to misannotations. This risk is particularly relevant when isomeric compounds with overlapping retention times and similar MS2 spectra are incorrectly assigned to the same feature group.
4 Conclusion
MS1FA is an interactive web-based Shiny app that enables comprehensive MS1 feature annotation of untargeted metabolomics datasets. It uniquely combines methods for identifying and annotating redundant features, which often presents a major challenge in the analysis. MS1FA processes data from widely used tools such as XCMS and MZmine, outputting an annotated feature table that can be interactively explored and exported. Key features include the matching of MS1 features to MS2 fragment data and correlation network analysis of intensity distributions post-perturbation, which are crucial for identifying in-source fragments. The app also annotates neutral losses, isotopes, adducts, and multimers, and supports metabolite identification through target list matching. Interactive plots allow users to explore correlation networks and intensity profiles, which are linked to the annotated peak table, providing an additional layer of data exploration.
Supplementary Material
btaf161_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chen L , Lu W, Wang L et al Metabolite discovery through global annotation of untargeted metabolomics data. Nat Methods 2021;18:1377–85.34711973 10.1038/s 41592-021-01303-3PMC 8733904 · doi ↗ · pubmed ↗
- 2Clark TN , Houriet J, Vidar WS et al Interlaboratory comparison of untargeted mass spectrometry data uncovers underlying causes for variability. J Nat Prod 2021;84:824–35.33666420 10.1021/acs.jnatprod.0c 01376 PMC 8326878 · doi ↗ · pubmed ↗
- 3Franke R , Overwin H, Häussler S et al Targeting bacterial gyrase with cystobactamid, fluoroquinolone, and aminocoumarin antibiotics induces distinct molecular signatures in Pseudomonas aeruginosa. m Systems 2021;6:e 00610-21.34254824 10.1128/m Systems.00610-21PMC 8407119 · doi ↗ · pubmed ↗
- 4Guo J , Shen S, Xing S et al IS Frag: de novo recognition of in-source fragments for liquid chromatography–mass spectrometry data. Anal Chem 2021;93:10243–50.34270210 10.1021/acs.analchem.1c 01644 · doi ↗ · pubmed ↗
- 5Kuhl C , Tautenhahn R, Böttcher C et al CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal Chem 2012;84:283–9.22111785 10.1021/ac 202450 g PMC 3658281 · doi ↗ · pubmed ↗
- 6Ma Y , Kind T, Yang D et al MS 2Analyzer: a software for small molecule substructure annotations from accurate tandem mass spectra. Anal Chem 2014;86:10724–31.25263576 10.1021/ac 502818 e PMC 4222628 · doi ↗ · pubmed ↗
- 7Schmid R , Petras D, Nothias L-F et al Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat Commun 2021;12:3832.34158495 10.1038/s 41467-021-23953-9PMC 8219731 · doi ↗ · pubmed ↗
- 8Senan O , Aguilar-Mogas A, Navarro M et al Clique MS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network. Bioinformatics 2019;35:4089–97.30903689 10.1093/bioinformatics/btz 207PMC 6792096 · doi ↗ · pubmed ↗
