# PROTRIDER: protein abundance outlier detection from mass spectrometry-based proteomics data with a conditional autoencoder

**Authors:** Daniela Klaproth-Andrade, Ines F Scheller, Georgios Tsitsiridis, Stefan Loipfinger, Christian Mertes, Dmitrii Smirnov, Holger Prokisch, Vicente A Yépez, Julien Gagneur

PMC · DOI: 10.1093/bioinformatics/btaf628 · Bioinformatics · 2025-11-20

## TL;DR

PROTRIDER is a new method to detect abnormal protein levels in mass spectrometry data, improving diagnosis of rare diseases and cancer.

## Contribution

PROTRIDER introduces a conditional autoencoder-based approach for detecting protein expression outliers with improved statistical calibration.

## Key findings

- PROTRIDER outperforms baseline methods like Z-scores and PCA in detecting protein expression outliers.
- Using a Student’s t-distribution improves statistical calibration compared to Gaussian distribution.
- AlphaMissense pathogenic variants are enriched in protein expression outliers detected by PROTRIDER.

## Abstract

Detection of gene regulatory aberrations enhances our ability to interpret the impact of inherited and acquired genetic variation for rare disease diagnostics and tumor characterization. While numerous methods for calling RNA expression outliers from RNA-sequencing data have been proposed, the establishment of protein expression outliers from mass spectrometry data is lacking.

Here, we propose and assess various modeling approaches to call protein expression outliers across three datasets from rare disease diagnostics and oncology. We use as independent evidence the enrichment for outlier calls in matched RNA-seq samples and the enrichment for rare variants likely disrupting protein expression. We show that controlling for hidden confounders and technical covariates, while simultaneously modeling the occurrence of missing values, is largely beneficial and can be achieved using conditional autoencoders. Moreover, we find that the differences between experimental and fitted log-transformed intensities by such models exhibit heavy tails that are poorly captured with the Gaussian distribution and report stronger statistical calibration when instead using the Student’s t-distribution. Our resulting method, PROTRIDER, outperformed baseline approaches based on raw log-intensities Z-scores, PCA, and isolation-based anomaly detection with Isolation forests. The application of PROTRIDER reveals significant enrichments of AlphaMissense pathogenic variants in protein expression outliers. Overall, PROTRIDER provides a method to confidently identify aberrantly expressed proteins applicable to rare disease diagnostics and cancer proteomics.

PROTRIDER is freely available at github.com/gagneurlab/PROTRIDER and also available on Zenodo under the DOI zenodo.15569781.

## Linked entities

- **Diseases:** rare disease (MONDO:0021200), cancer (MONDO:0004992)

## Full-text entities

- **Diseases:** cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12931418/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12931418/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12931418/full.md

---
Source: https://tomesphere.com/paper/PMC12931418