# Prediction of Metastasis in Paragangliomas and Pheochromocytomas Using Machine Learning Models: Explainability Challenges

**Authors:** Carmen García-Barceló, David Gil, David Tomás, David Bernabeu

PMC · DOI: 10.3390/s25134184 · Sensors (Basel, Switzerland) · 2025-07-04

## TL;DR

This paper presents a machine learning approach to predict metastasis in paragangliomas and pheochromocytomas while addressing the need for model explainability in healthcare.

## Contribution

The study introduces an architecture combining machine learning with explainability techniques to improve metastasis prediction and clinical trust.

## Key findings

- Random Forest achieved 96.3% accuracy in predicting metastasis.
- Explainability techniques were integrated to identify key predictive factors.
- Feature selection improved model performance and interpretability.

## Abstract

One of the main issues with paragangliomas and pheochromocytomas is that these tumors have up to a 20% rate of metastatic disease, which cannot be reliably predicted. While machine learning models hold great promise for enhancing predictive accuracy, their often opaque nature limits trust and adoption in critical fields such as healthcare. Understanding the factors driving predictions is essential not only for validating their reliability but also for enabling their integration into clinical decision-making. In this paper, we propose an architecture that combines data mining, machine learning, and explainability techniques to improve predictions of metastatic disease in these types of cancer and enhance trust in the models. A wide variety of algorithms have been applied for the development of predictive models, with a focus on interpreting their outputs to support clinical insights. Our methodology involves a comprehensive preprocessing phase to prepare the data, followed by the application of classification algorithms. Explainability techniques were integrated to provide insights into the key factors driving predictions. Additionally, a feature selection process was performed to identify the most influential variables and explore how their inclusion affects model performance. The best-performing algorithm, Random Forest, achieved an accuracy of 96.3%, precision of 96.5%, and AUC of 0.963, among other metrics, combining strong predictive capability with explainability that fosters trust in clinical applications.

## Linked entities

- **Diseases:** paragangliomas (MONDO:0000448), metastatic disease (MONDO:0024883)

## Full-text entities

- **Diseases:** Pheochromocytomas (MESH:D010673), cancer (MESH:D009369), Metastasis (MESH:D009362), Paragangliomas (MESH:D010235)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12252517/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12252517/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12252517/full.md

---
Source: https://tomesphere.com/paper/PMC12252517