# Molecular networking, conformal predictions and revised fingerprint-based models for discovering endocrine disruptors in mixtures

**Authors:** Yvonne Kreutzer, Ida Rahu, Ulf Norinder, Anneli Kruve

PMC · DOI: 10.1007/s00216-025-06303-2 · 2026-01-22

## TL;DR

This paper introduces new methods to identify harmful chemicals in mixtures using mass spectrometry data and machine learning.

## Contribution

The study introduces molecular networking and conformal predictions for feature prioritization in non-targeted screening.

## Key findings

- Revised fingerprint-based models achieved a false positive rate of 0.35 at 90% recall.
- Molecular networking and conformal predictions prioritized 29 features linked to AhR agonism in wastewater samples.
- Three features were identified at level 1, showing potential for combined prioritization strategies.

## Abstract

Prioritizing high-risk features is a key step to reduce workload in non-targeted screening (NTS) when identifying environmental contaminants. Machine learning models from the MS2Tox toolbox have shown promise for feature prioritization, but rely heavily on the accuracy of molecular formulas and fingerprint features provided by SIRIUS + CSI:FingerID. In this study, we introduce and evaluate two new approaches—molecular networking (MN) and conformal predictions—to discover unidentified compounds potentially posing endocrine-disrupting activity based on tandem mass spectral similarity. Furthermore, we revised the previously published MS2Tox models, leveraging molecular fingerprints for seven Tox21 Data Challenge endpoints. The fingerprint-based MS2Tox models achieved the lowest false positive rate, 0.35, at 90% recall on the test set, while MN and CP yielded 0.82 and 0.68, respectively. In a case study of transformation products and persistent chemicals in wastewater, these three approaches prioritized 29 features in influent and effluent samples as potentially associated with AhR agonism among 189 LC/HRMS features corresponding to transformation products and persistent chemicals. All candidate structures for prioritized features showed scaffolds related to AhR binding affinity. Three features were identified on level 1, showcasing potential in using combined feature prioritization strategies.

The online version contains supplementary material available at 10.1007/s00216-025-06303-2.

## Full-text entities

- **Genes:** AHR (aryl hydrocarbon receptor) [NCBI Gene 196] {aka FVH3, RP85, bHLHe76}
- **Diseases:** endocrine-disrupting (MESH:D004700)
- **Chemicals:** MS2Tox (-)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12909378/full.md

---
Source: https://tomesphere.com/paper/PMC12909378