# Enhancing subscription fraud detection through ensemble learning the case of Ethio telecom

**Authors:** Esubalew Asmare Desta, Kidus Workineh Azale, Abenet Alazar Hailu, Fikadu Berie Adugna, Alexander Takele Mengesha, Selamawit Fentie Belay, Habtamu Ayenew Asegie, Ayodeji Olalekan Salau

PMC · DOI: 10.1038/s41598-026-38790-3 · Scientific Reports · 2026-02-09

## TL;DR

This paper presents an advanced fraud detection model for Ethio Telecom using ensemble learning techniques to improve subscription fraud detection accuracy.

## Contribution

The novel contribution is the application of stacking and adaptive random forest models to subscription fraud detection in the telecom sector.

## Key findings

- Stacking and Adaptive Random Forest (ARF) models showed robust performance in detecting subscription fraud.
- Eight key features were identified as most relevant for fraud detection after thorough feature selection.
- Ensemble methods outperformed individual models like Decision Tree and Logistic Regression.

## Abstract

Telecommunication companies globally face the critical challenge of subscription fraud, which threatens both financial stability and national security. This research addresses this issue by developing an advanced fraud detection model specifically for Ethio Telecom. The model utilizes Ensemble and Adaptive Learning techniques to enhance detection accuracy by combining multiple classifiers. The study used a dataset of 1,000,000 Call Detail Records (CDRs) collected over two months known for increased fraudulent activity3. After filtering out irrelevant data and aggregating multiple call records per subscriber, the dataset was refined to 349,164 records. Initially, 16 features were analyzed, with four excluded for lacking relevance. The remaining 11 features, excluding the target variable, underwent preprocessing including data cleaning, transformation, and balancing4. Feature selection, utilizing Correlation Matrix and Random Forest importance analysis, led to the removal of four additional features, resulting in a final set of 8 key features, including INT_DIALLED, RATIO_INT_TOTAL, and RATIO_UNIQUE_TOTAL4. Three individual models, namely Decision Tree (DT), Logistic Regression (LR), and Artificial Neural Network (ANN), were implemented alongside ensemble methods such as Bagging, Boosting, Stacking, and Voting, and adaptive models like Hoeffding Tree and Adaptive Random Forest45. The findings of this research recommend Stacking and Adaptive Random Forest (ARF) as robust tools for subscription fraud detection.

## Full-text entities

- **Genes:** INTU (inturned planar cell polarity protein) [NCBI Gene 27152] {aka CPLANE4, INT, OFD17, PDZD6, PDZK6, SRTD20}

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12953688/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12953688/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC12953688/full.md

---
Source: https://tomesphere.com/paper/PMC12953688