# Toward robust surgical phase recognition via deep ensemble learning

**Authors:** Flakë Bajraktari, Lina Hauser, Peter P. Pott

PMC · DOI: 10.1007/s11548-025-03543-6 · International Journal of Computer Assisted Radiology and Surgery · 2025-11-08

## TL;DR

This paper explores using deep ensemble learning to improve the accuracy of recognizing surgical phases, showing that combining diverse models leads to better performance than individual models.

## Contribution

The study introduces an ensemble learning approach that combines diverse deep learning models to enhance surgical phase recognition accuracy and reliability.

## Key findings

- Ensemble learning significantly improved performance metrics like F1-score, accuracy, and Jaccard Index compared to individual models.
- High model diversity in ensembles led to superior performance compared to less diverse ensembles.
- Majority voting and the proposed StackingNet ensemble strategies achieved the best results.

## Abstract

Automatic recognition of surgical workflows is a complex yet essential task of context-aware systems in the operating room. However, achieving high accuracy in phase recognition remains a challenge due to the complexity of surgical procedures. While recent deep learning models have made significant progress, individual models often exhibit limitations—some may excel at capturing spatial features, while others are better at modeling temporal dependencies or handling class imbalance.

This study investigates the use of ensemble learning to combine the complementary strengths of diverse architectures, aiming to mitigate individual model weaknesses and improve performance in surgical phase recognition using the Cholec80 dataset. A variety of advanced deep learning architectures was integrated into a single ensemble. Models were carefully selected and tuned to ensure diversity, resulting in a final set of 15 unique ensembles. Ensemble strategies were explored to determine the most effective method for combining the distinct models.

The results demonstrated that ensemble learning significantly improved performance. Among the ensemble strategies tested, majority voting achieved the highest F1-score, followed by the proposed artificial neural network StackingNet. Ensembles with high model diversity showed superior performance compared to those with lower diversity. The optimal ensemble configuration integrated top-performing models from different architectures, leading to improvements in accuracy, F1-score, and Jaccard Index by 1.48 %, 3.68 %, and 5.43 %, respectively, compared to the best individual models.

This study demonstrates that ensemble learning can substantially enhance surgical phase recognition by leveraging the complementary strengths of diverse deep learning models. Ensemble size, diversity, and meta-model selection were identified as key factors influencing performance. The resulting improvements translate into clinically meaningful benefits by enabling more reliable context-aware guidance, reducing misclassifications during critical phases, and improving surgeons’ trust in artificial intelligence (AI) systems.

## Full-text entities

- **Diseases:** cancer (MESH:D009369)
- **Chemicals:** AB (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12929341/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12929341/full.md

## References

2 references — full list in the complete paper: https://tomesphere.com/paper/PMC12929341/full.md

---
Source: https://tomesphere.com/paper/PMC12929341