# AKUDENTAL teeth instance segmentation dataset: a cross-dataset analysis

**Authors:** Melih Oz, Aycan Sengul, Mukerrem Hatipoglu, Taner Danisman

PMC · DOI: 10.1186/s12903-025-07645-0 · 2026-01-12

## TL;DR

The paper introduces the AKUDENTAL dataset for dental radiograph segmentation and shows that annotation differences across datasets significantly affect AI model performance.

## Contribution

The novel contribution is the AKUDENTAL dataset with annotated dental structures and a cross-dataset analysis highlighting annotation inconsistencies as a key challenge.

## Key findings

- Performance differences in AI models are largely due to variations in annotation protocols across datasets.
- Mean Average Precision scores varied widely, from 0.34 on DENTEX to 0.71 on Dual-labeled datasets.
- Annotation inconsistencies are a critical barrier to developing universally applicable dental AI models.

## Abstract

Artificial Intelligence (AI) is reshaping diagnostics and disease prevention in the dental domain. Panoramic X-ray imaging is central to this progress but demands large, high-quality annotated datasets. We therefore present AKUDENTAL, a new dataset for instance segmentation of dental radiographs, to serve as a resource for model development and to assess the challenges of generalizability.

We annotated 333 panoramic images, labeling 9,956 structures across 32 individual teeth and three restorative categories: implants, bridges, and crown–filling. We established semantic segmentation, object detection, and instance-segmentation baselines using UNet, DeepLabV3 + , YOLOv11, and Mask R-CNN models. Generalizability was assessed via 5-fold cross-validation and a cross-dataset evaluation on the Tufts, DENTEX, and Dual-labeled datasets.

A cross-dataset evaluation on the Tufts, DENTEX, and Dual-labeled datasets revealed that variations in annotation protocols are a significant factor contributing to performance differences. The cross-dataset evaluation demonstrated widely varying performance, with mean Average Precision (mAP) scores for multiclass detection ranging from a low of 0.34 on the DENTEX dataset to 0.71 on the Dual-labeled dataset Our analysis illustrates how such discrepancies can impact the interpretation of model performance.

The AKUDENTAL dataset provides a robust new resource for the field. The performance disparities revealed in our cross-dataset analysis are not model limitations but instead strengthen the argument that annotation inconsistencies are a critical barrier to developing universally applicable AI. This highlights the imperative for broader standardization in data annotation, extending beyond tooth identification to encompass common dental procedures and restorations.

## Full-text entities

- **Diseases:** tooth loss (MESH:D016388), fatigue (MESH:D005221), supernumerary (MESH:D014096), caries (MESH:D003731)
- **Chemicals:** DENTEX (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** YOLOv11 — Homo sapiens (Human), Transformed cell line (CVCL_C1JD), YOLOv11-L — Homo sapiens (Human), Childhood acute myeloid leukemia, Cancer cell line (CVCL_JX44)

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12874774/full.md

---
Source: https://tomesphere.com/paper/PMC12874774