# Assessing observer-dependent dental age estimation procedures: intra- and inter-observer reliability across four well established radiographic systems for dental analysis

**Authors:** Nikolaos Angelakopoulos, Rizky Merdietio Boedi, Ademir Franco, Nikita Polukhin, Akiko Kumagai, Ivan Galic, Jeta Kelmendi, Israel Soriano Vázquez, Sang-Seob Lee, Galina Zolotenkova, Roberto Scendoni, Stefano De Luca

PMC · DOI: 10.1007/s00414-025-03616-w · International Journal of Legal Medicine · 2025-10-25

## TL;DR

This study evaluates the reliability of four dental age estimation methods used in forensic contexts, finding that they are generally reliable but with some method-specific differences.

## Contribution

The study provides a comparative analysis of intra- and inter-observer reliability across four established dental age estimation systems.

## Key findings

- The I3M method showed the highest inter-observer agreement with an ICC of 0.986.
- Maxillary third molars had lower inter-observer agreement than mandibular ones using DEM and GHK methods.
- All methods yielded highly reliable results, with DEM and GHK showing particularly strong performance.

## Abstract

In forensic contexts, age assessments constitute matters of substantial legal consequence, particularly in proceedings involving children and young adolescents. Dental age estimation (DAE) techniques are widely used for this purpose, especially in cases involving undocumented minors. This study assesses intra- and inter-observer reliability across four well established radiographic systems for dental analysis.: Gleiser and Hunt Modified by Köhler (GHK), Demirjian (DEM), Kullman (KUL), and Cameriere’s Third Molar Maturity Index (I3M). A total of 50 panoramic radiographs from individuals aged 14-23.99 years were analyzed by nine qualified forensic experts. The observers assessed the development stages of third molars using the three staging methods (GHK, DEM, KUL) and measured the I3M using Cameriere's metric approach. Primarily, the quantitative assessment for analyzing the agreement was Cohen’s Kappa, Gwet’s Agreement Coefficient (AC1) and (AC2), and Intraclass Correlation Coefficient (ICC). Statistical analysis revealed high intra-observer reliability for all methods, with coefficient values indicating strong agreement among individual observers. In terms of inter-observer reliability, the I3M achieved the highest agreement (ICC 0.986), followed by DEM (AC2 0.918), GHK (AC2 0.914), and KUL (AC2 0.868). Notably, maxillary third molars consistently showed lower inter-observer agreement than mandibular third molars, particularly when assessed using the DEM and GHK methods. The highest inter-observer agreement in cases where a tooth could not be staged or measured was observed for the KUL method (AC1 0.993), followed by I3M (AC1 0.988), with DEM and GHK, demonstrating equivalent levels of agreement (AC1 0.954). All of the tested methods yielded highly reliable results, especially DEM and GHK. The choice of a staging method should be guided by the specific objectives of each study. Moreover, while the I3M method demonstrated high reliability values, obtaining identical repeated measurements was nearly impossible due to its metric approach.

## Full-text entities

- **Genes:** LINC01587 (long intergenic non-protein coding RNA 1587) [NCBI Gene 10141] {aka C4orf6, aC1}
- **Diseases:** tumors (MESH:D009369), dental abnormalities (MESH:D014071), third (MESH:D015840), mandibular or maxillary fractures (MESH:D008337), infection (MESH:D007239), DAE (MESH:D019588), supernumerary teeth (MESH:D014096)
- **Chemicals:** Ivan (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12957622/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12957622/full.md

---
Source: https://tomesphere.com/paper/PMC12957622