Molecular networking, conformal predictions and revised fingerprint-based models for discovering endocrine disruptors in mixtures
Yvonne Kreutzer, Ida Rahu, Ulf Norinder, Anneli Kruve

TL;DR
This paper introduces new methods to identify harmful chemicals in mixtures using mass spectrometry data and machine learning.
Contribution
The study introduces molecular networking and conformal predictions for feature prioritization in non-targeted screening.
Findings
Revised fingerprint-based models achieved a false positive rate of 0.35 at 90% recall.
Molecular networking and conformal predictions prioritized 29 features linked to AhR agonism in wastewater samples.
Three features were identified at level 1, showing potential for combined prioritization strategies.
Abstract
Prioritizing high-risk features is a key step to reduce workload in non-targeted screening (NTS) when identifying environmental contaminants. Machine learning models from the MS2Tox toolbox have shown promise for feature prioritization, but rely heavily on the accuracy of molecular formulas and fingerprint features provided by SIRIUS + CSI:FingerID. In this study, we introduce and evaluate two new approaches—molecular networking (MN) and conformal predictions—to discover unidentified compounds potentially posing endocrine-disrupting activity based on tandem mass spectral similarity. Furthermore, we revised the previously published MS2Tox models, leveraging molecular fingerprints for seven Tox21 Data Challenge endpoints. The fingerprint-based MS2Tox models achieved the lowest false positive rate, 0.35, at 90% recall on the test set, while MN and CP yielded 0.82 and 0.68, respectively. In…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Metabolomics and Mass Spectrometry Studies
