# Towards robust medical machine olfaction: Debiasing GC-MS data enhances prostate cancer diagnosis from urine volatiles

**Authors:** Adan Rotteveel, Wen-Yee Lee, Zoi Kountouri, Nikolas Stefanou, Howard Kivell, Clifford Gluck, Shuguang Zhang, Andreas Mershin, Li Yang, Li Yang, Li Yang, Li Yang

PMC · DOI: 10.1371/journal.pone.0314742 · PLOS One · 2025-05-30

## TL;DR

This paper introduces a machine learning method to detect prostate cancer from urine VOCs using raw GC-MS data, avoiding the need for invasive biopsies.

## Contribution

A novel machine learning pipeline is proposed that extracts scent signatures from raw GC-MS data without relying on molecular identification.

## Key findings

- The model achieves a recall of 88% and an F1-score of 0.78 in prostate cancer classification.
- The approach outperforms traditional biomarker-based methods by avoiding molecular identification.
- The pipeline includes debiasing techniques like empirical Bayes correction and domain adversarial learning.

## Abstract

Prostate cancer (PCa) is a major, and increasingly global, health concern with current screening and diagnostic tools’ severe limitations causing unnecessary, invasive biopsy procedures. While gas chromatography–mass spectrometry (GC-MS) has been used to detect urinary volatile organic compounds (VOCs) associated with PCa, efforts to identify consistent molecular biomarkers have failed to generalize across studies. Inspired by the olfactory diagnostic capabilities of medical detection dogs, we do not reduce chromatograms to a list of compounds and concentrations. Instead, we deploy a machine learning approach that bypasses molecular identification: PCa “scent character" signatures are extracted from raw time series data transformed into image representations for classification via convolutional neural networks. To address confounding factors such as sample-source bias, we implement a multi-step pre-processing and debiasing pipeline, including empirical Bayes correction, baseline drift removal, and domain adversarial learning. The resulting model achieves classification performance on par with similarly trained canines, achieving a recall of 88% and an F1-score of 0.78. These findings demonstrate that, at least in the context of PCa detection from urine, machine learning-based scent signature analysis can serve as a fully non-invasive diagnostic alternative, with these early results being also relevant to the wider emergent field of medical machine olfaction.

## Linked entities

- **Diseases:** prostate cancer (MONDO:0005159)

## Full-text entities

- **Diseases:** PCa (MESH:D011471)
- **Chemicals:** VOCs (MESH:D055549)
- **Species:** Canis lupus familiaris (dog, subspecies) [taxon 9615]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12124533/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12124533/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12124533/full.md

---
Source: https://tomesphere.com/paper/PMC12124533