# Novel Sewer Defect Prediction Leveraging Advanced Machine Learning (ML) Models

**Authors:** Vannary Seng, Barbara J. Lence, Sudhir Kshirsagar, Srujana Rangapuram, Pavan Saranguhewa

PMC · DOI: 10.1002/wer.70338 · Water Environment Research · 2026-03-19

## TL;DR

This paper introduces machine learning models to predict specific sewer defects, using pipe characteristics and location data, which helps improve sewer network management.

## Contribution

The study presents a novel ML-based approach for predicting infiltration and structural defects in sewer pipes, using SHAP to identify key predictors.

## Key findings

- LightGBM-based models with cost-sensitive learning achieved the best performance with AUC-ROC of 0.79 and AUC-PR of 0.62.
- Pipe location and age were identified as the most important predictors of infiltration and structural defects using SHAP analysis.
- Utility-specific models were developed for two Western Canada utilities, demonstrating the approach's adaptability.

## Abstract

A novel approach to sewer network assessment is presented that uses artificial intelligence (AI)/machine learning (ML) to predict infiltration and structural defect occurrences in each pipe instead of estimating the traditional criteria‐based overall pipe condition or likelihood of failure. A comparative analysis of four decision tree‐based ML models, and their use in predicting the defect locations in sewer networks, is presented. The models are developed using data from closed‐circuit television (CCTV) inspections coupled with additional pipe information and inspection reports. The ML approach uses such information from two utilities to create utility‐specific defect prediction models. The class imbalance in the data, due to more defects than nondefects, is addressed with three methods, and the hyperparameters, settings that define the model architecture, are optimized via a repeated stratified k‐fold cross‐validation grid search. The performance of the models is assessed using the area under the receiver operating characteristics (AUC‐ROC) and precision recall (AUC‐PR) curves. LightGBM‐based models, with the cost‐sensitive learning method for addressing class imbalance, show the best performance overall when predicting either types of defects for both utilities. The best performing model achieves an AUC‐ROC of 0.79 and an AUC‐PR of 0.62. For the two utilities investigated, an application of SHapley Additive exPlanations (SHAP) shows that the most important features for indicating both types of defects are “pipe location” and “pipe age.”

Machine learning models are developed to predict infiltration and structural defect occurrences in sewers based on pipe characteristics and locations.Models predict specific defects rather than overall pipe condition, making them suitable for data from various condition assessment standards.SHapley Additive exPlanations (SHAP) analyses are applied to identify pipe characteristics that are most associated with the occurrence of infiltration and structural defects.Models may be used to undertake the asset management of sewer networks.

Machine learning models are developed to predict infiltration and structural defect occurrences in sewers based on pipe characteristics and locations.

Models predict specific defects rather than overall pipe condition, making them suitable for data from various condition assessment standards.

SHapley Additive exPlanations (SHAP) analyses are applied to identify pipe characteristics that are most associated with the occurrence of infiltration and structural defects.

Models may be used to undertake the asset management of sewer networks.

An evaluation of four decision tree‐based machine learning models is conducted to predict infiltration and structural defects occurrences in sewer networks using pipe characteristics and locations. Utility‐specific models are developed for two Western Canada utilities showing pipe location and age, two of the key predictors of both defect types.

## Full-text entities

- **Diseases:** structural defects (MESH:D020914), Sewer Defect (MESH:D000013), PVC (MESH:C536210)
- **Chemicals:** VCP (MESH:C034588), PVC (MESH:D011143), AC (-), PE (MESH:D020959)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13001702/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13001702/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC13001702/full.md

---
Source: https://tomesphere.com/paper/PMC13001702