# Breast Cancer Data Analysis Using Supervised Machine Learning Algorithms

**Authors:** Durga H Kutal, Beyza N Koseoglu

PMC · DOI: 10.7759/cureus.95011 · Cureus · 2025-10-20

## TL;DR

This paper compares machine learning models for breast cancer classification using real-world data, finding that random forest and polynomial SVM perform best with high accuracy.

## Contribution

The study evaluates multiple supervised learning algorithms on a breast cancer dataset and identifies the most effective models and key predictive features.

## Key findings

- Random forest and polynomial SVM achieved highest AUC values of 96.3% and 96.9%.
- Tumor Size, Involved Lymph Nodes, Metastasis, and Age were most significant predictors.
- PCA-based dimensionality reduction maintained high model performance.

## Abstract

Breast cancer is one of the most serious diseases and a leading cause of cancer-related deaths for women worldwide. This study evaluates and compares the performance of several supervised machine learning algorithms for breast cancer tumor classification, using a real-world dataset (sourced from Kaggle.com). From an initial 212 observations, the final dataset was reduced to 205 after handling missing values. We employed logistic regression, decision tree, random forest, and support vector machines (SVMs) with various kernels, focusing on model accuracy, feature importance, and the impact of dimensionality reduction. All models demonstrated strong performance, with accuracies above 87%. The most effective classifiers were the random forest and polynomial SVM, achieving the highest area under the curve (AUC) values of 96.3% and 96.9%, respectively. Feature importance analysis consistently identified Tumor Size, Involved Lymph Nodes, Metastasis, and Age as the most significant predictors. The high accuracy of simpler models, such as logistic regression and a linear SVM, is attributed to the dataset's inherent linear separability. Our findings also validate the use of principal component analysis (PCA) for feature reduction, as key models maintained high performance on the simplified dataset.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** Metastasis (MESH:D009362), Tumor (MESH:D009369), Breast Cancer (MESH:D001943)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12635515/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12635515/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12635515/full.md

---
Source: https://tomesphere.com/paper/PMC12635515