# Seeing the primary tumor because of all the trees: Cancer type prediction on low-dimensional data

**Authors:** Julia Gehrmann, Devina Johanna Soenarto, Kevin Hidayat, Maria Beyer, Lars Quakulinski, Samer Alkarkoukly, Scarlett Berressem, Anna Gundert, Michael Butler, Ana Grönke, Simon Lennartz, Thorsten Persigehl, Thomas Zander, Oya Beyan

PMC · DOI: 10.3389/fmed.2024.1396459 · Frontiers in Medicine · 2024-08-27

## TL;DR

This study shows that low-dimensional clinical data can accurately predict the primary tumor location in cancer patients, similar to high-dimensional data methods.

## Contribution

The novelty lies in demonstrating that routine clinical data can replace high-dimensional data for accurate cancer type prediction.

## Key findings

- A tree-based model achieved 94% accuracy and 0.92 MCC for four cancer types using low-dimensional data.
- The model maintained 85% accuracy and 0.81 MCC for eight cancer types, matching high-dimensional methods.
- Metastasis distribution patterns are important for predicting the primary tumor location.

## Abstract

The Cancer of Unknown Primary (CUP) syndrome is characterized by identifiable metastases while the primary tumor remains hidden. In recent years, various data-driven approaches have been suggested to predict the location of the primary tumor (LOP) in CUP patients promising improved diagnosis and outcome. These LOP prediction approaches use high-dimensional input data like images or genetic data. However, leveraging such data is challenging, resource-intensive and therefore a potential translational barrier. Instead of using high-dimensional data, we analyzed the LOP prediction performance of low-dimensional data from routine medical care. With our findings, we show that such low-dimensional routine clinical information suffices as input data for tree-based LOP prediction models. The best model reached a mean Accuracy of 94% and a mean Matthews correlation coefficient (MCC) score of 0.92 in 10-fold nested cross-validation (NCV) when distinguishing four types of cancer. When considering eight types of cancer, this model achieved a mean Accuracy of 85% and a mean MCC score of 0.81. This is comparable to the performance achieved by approaches using high-dimensional input data. Additionally, the distribution pattern of metastases appears to be important information in predicting the LOP.

## Full-text entities

- **Diseases:** metastases (MESH:D009362), Cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11385615/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11385615/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC11385615/full.md

---
Source: https://tomesphere.com/paper/PMC11385615