# Generalizability of YOLOv11 models for mesiodens detection in pediatric panoramic radiographs

**Authors:** Henri Hartman, Adinara Savero, Tjinta Kaulika Tamim, Farina Pramanik, Saiful Akbar, Denny Nurdin, Arlette Suzy Setiawan

PMC · DOI: 10.1186/s12903-026-07713-z · 2026-01-28

## TL;DR

This study compares YOLOv11 models for detecting mesiodens in pediatric dental X-rays, finding that one model performs reliably in real-world settings.

## Contribution

The novel contribution is evaluating YOLOv11 models' generalizability for mesiodens detection using two cloud platforms and a pediatric dataset.

## Key findings

- The YOLOv11l model on Ultralytics showed stable performance with an inference F1-score of 96.78%.
- YOLOv11 Accurate had high validation metrics but lower real-world performance with an F1-score of 84.30%.
- The study emphasizes the importance of model generalization over peak validation metrics for clinical use.

## Abstract

Mesiodens is a type of supernumerary tooth in the anterior maxilla with various prevalences. To prevent complications in the future, accurate and precise detection is needed.

This study aimed to evaluate and compare YOLOv11-based convolutional neural network (CNN) models for mesiodens detection in pediatric panoramic radiographs using two cloud-based platforms, Roboflow and Ultralytics.

This study involved 480 pediatric panoramic radiographs, consisting of 240 mesiodens and 240 no mesiodens images, annotated using Roboflow, with a region of interest (ROI) focused on the anterior maxillary area. The dataset was divided into training (70%), validation (20%), and testing (10%) subsets. Model performance was evaluated using mean average precision (mAP), precision, recall, and F1-score.

The YOLOv11 Accurate model trained on the Roboflow platform achieved the highest validation mAP50 at 99.2% and recall at 100%. However, its performance declined on inference data, where the F1-score was 84.30%. In contrast, the YOLOv11l model trained on the Ultralytics platform showed more stable performance: its validation mAP was 99.3%, precision was 99.11%, and recall was 94.57%, while the inference F1-score was 96.78%, showing robust generalizability and supporting its suitability for clinical practice.

YOLOv11l demonstrated the most reliable balance between validation and inference performance, suggesting suitability for clinical application. These results highlight the importance of model generalization rather than peak validation metrics. Future studies should therefore evaluate multicenter datasets and broader clinical settings to confirm robustness and applicability in diverse pediatric populations.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Diseases:** and palate (MESH:D002972), Mesiodens (MESH:C538336), cleft lip (MESH:D002971), dental anomalies (OMIM:614188), Supernumerary teeth (MESH:D014096), Gardner syndrome (MESH:D005736), diastema (MESH:D003970), cleidocranial dysplasia (MESH:D002973), hyperactivity (MESH:D006948), root resorption (MESH:D012391)
- **Chemicals:** IoU (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** YOLOv11 — Homo sapiens (Human), Transformed cell line (CVCL_C1JD), YOLOv11l — Homo sapiens (Human), Childhood acute monocytic leukemia, Cancer cell line (CVCL_3427)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12924467/full.md

---
Source: https://tomesphere.com/paper/PMC12924467