# Inter-observer variability in radiotherapy contouring with the use of autocontouring software: A systematic review

**Authors:** Polly Darby, Emily Kilgour, Chee Kin Then, Andrew Bromiley, John McLellan, Anne E. Kiltie

PMC · DOI: 10.1016/j.ctro.2026.101144 · Clinical and Translational Radiation Oncology · 2026-03-09

## TL;DR

Autocontouring software, especially deep learning models, reduces variability in radiotherapy contouring, but performance depends on anatomical complexity.

## Contribution

Systematic review showing deep learning autocontouring outperforms atlas-based methods in reducing inter-observer variability.

## Key findings

- Deep learning autocontouring achieves high Dice scores for well-defined organs at risk.
- Autocontouring reduces variability for lungs, bladder, and heart but has limited benefit in complex anatomy.
- Manual review is still required due to remaining discrepancies in contouring.

## Abstract

•Autocontouring significantly reduces inter-observer variability in delineation.•Deep learning segmentation outperforms atlas-based methods in consistency.•Edited contours achieve high Dice scores for well-defined targets and OARs.•Reduced variability for lungs, bladder, heart; limited benefit in complex anatomy.•Clinical utility remains structure-dependent; manual review is still required.

Autocontouring significantly reduces inter-observer variability in delineation.

Deep learning segmentation outperforms atlas-based methods in consistency.

Edited contours achieve high Dice scores for well-defined targets and OARs.

Reduced variability for lungs, bladder, heart; limited benefit in complex anatomy.

Clinical utility remains structure-dependent; manual review is still required.

Inter-observer variability (IOV) in radiotherapy contouring remains a significant source of uncertainty, especially for complex anatomical regions. Autocontouring software, including both atlas-based and deep-learning-based models, aims to improve contouring consistency and reduce workload. A systematic review was conducted in accordance with PRISMA guidelines to evaluate the impact of autocontouring software on IOV. Twenty-five eligible studies were identified that quantitatively assessed IOV using these tools. Extracted data included anatomical site, observer and case numbers, contouring method and evaluation metrics. Most studies reported significant reductions in IOV with the use of autocontouring. Edited autocontours frequently achieved mean Dice Similarity Coefficient (DSC) values above 0.85 for clinical and planning target volumes and 0.90 for organs at risk (OARs) with well-defined anatomy, using deep learning methods. Deep-learning-based models demonstrated greater consistency compared to atlas-based methods. Structures such as the lungs, heart and bladder showed the most substantial improvements, while anatomically indistinct targets such as the prostate bed and pelvic lymph nodes showed limited benefit. However, discrepancies remained between observers for certain structures despite the use of automation. Overall, autocontouring tools, particularly deep-learning models, improve contouring consistency in radiotherapy planning. However, performance is strongly influenced by anatomical complexity and segmentation method. Larger multi-institutional studies and standardised evaluation protocols are needed to support widespread clinical adoption and strengthen quality assurance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12996195/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12996195/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/PMC12996195/full.md

---
Source: https://tomesphere.com/paper/PMC12996195