# Explainable AI-driven customer churn prediction: a multi-model ensemble approach with SHAP-based feature analysis

**Authors:** Ali El Attar, Mohammed El-Hajj

PMC · DOI: 10.3389/frai.2026.1748799 · Frontiers in Artificial Intelligence · 2026-02-10

## TL;DR

This paper uses machine learning and explainable AI to predict customer churn in telecom, identifying key factors like contract type and tenure that influence customer retention.

## Contribution

A novel multi-model ensemble approach with SHAP-based feature analysis for explainable customer churn prediction in telecommunications.

## Key findings

- Gradient boosting models achieved high performance with XGBoost showing the best discriminative ability (AUC-ROC: 0.932).
- SHAP analysis identified contract type, tenure, and technical support as key predictors of churn.
- Threshold optimization reduced false negatives by 15% while balancing precision and recall.

## Abstract

Customer churn prediction is critical for telecommunications companies to maintain profitability and inform retention strategies. This study builds upon existing work by implementing a comprehensive machine learning framework using the Telco Customer Churn dataset (n = 7,043). Our methodology integrated comprehensive feature engineering, SMOTE oversampling, and training of seven machine learning models including XGBoost, Random Forest, and a Multi-layer Perceptron. Model interpretation was conducted via SHAP analysis and customer segmentation. Key results demonstrated that gradient boosting algorithms (XGBoost, LightGBM, Gradient Boosting) achieved the highest balanced performance with accuracy, precision, recall, and F1-scores of 0.84, with XGBoost attaining the best discriminative ability (AUC-ROC: 0.932). A soft-voting ensemble of the top models matched this performance (F1-score: 0.84, AUC-ROC: 0.918). SHAP analysis revealed that contract type, tenure, and technical support were the features contributing most to the model's churn predictions. Threshold optimization at 0.528 balanced precision (0.90) and recall (0.91) while reducing false negatives by 15%. The findings provide actionable insights for prioritizing high-risk customers and designing targeted retention strategies in the telecom sector.

## Full-text entities

- **Genes:** CFP (complement factor properdin) [NCBI Gene 5199] {aka BFD, PFC, PFD, PROPERDIN}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Chemicals:** telco (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12929532/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12929532/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12929532/full.md

---
Source: https://tomesphere.com/paper/PMC12929532