# Potential of Artificial Intelligence for Bone Age Assessment in Iranian Children and Adolescents: An Exploratory Study

**Authors:** Mehrzad Lotfi, Nahid Abolpour, Mohammadreza Ghasemi, Hajar Heydari, Reza Pourghayumi

PMC · DOI: 10.34172/aim.32070 · Archives of Iranian Medicine · 2025-04-01

## TL;DR

This study explores using AI to assess bone age in Iranian children, finding it accurate but needing improvement, especially for boys.

## Contribution

A deep learning model for bone age assessment tailored to Iranian children is proposed and evaluated.

## Key findings

- The AI model showed higher accuracy for girls (ICC 0.82) compared to boys (ICC 0.74).
- Mean absolute error was 0.59 years for boys and 0.61 years for girls.
- The model's 95% limits of agreement indicate variability, suggesting room for improvement.

## Abstract

To investigate whether the bone age (BA) of Iranian children could be accurately assessed via an artificial intelligence (AI) system. Accurate assessment of skeletal maturity is crucial for diagnosing and treating various musculoskeletal disorders, and is traditionally achieved through manual comparison with the Greulich-Pyle atlas. This process, however, is subjective and time-consuming. Recent advances in deep learning offer more efficient and consistent BA evaluations.

From left-hand radiographs of children aged 1–18 years who presented to a tertiary research hospital, 555 radiographs (220 boys and 335 girls) were collected. The reference BA was determined via the Greulich and Pyle (GP) method by two radiologists in consensus. The BA was then estimated to use a deep learning model specifically developed for this population. Model performance was evaluated using multiple metrics: Mean square error (MSE), mean absolute error (MAE), intra-class correlation coefficient (ICC), and 95% limits of agreement (LoA). Gender-specific results were analyzed separately.

The model demonstrated acceptable accuracy. For boys, MSE was 0.55 years, MAE was 0.59 years, ICC was 0.74, and the 95% LoA ranged from -0.8 to 1.2 years. For girls, MSE was 0.59 years, MAE was 0.61 years, ICC was 0.82, and the 95% LoA ranged from -0.6 to 1.0 years. These results indicate stronger predictive accuracy for girls compared to boys.

Our findings demonstrate that the proposed deep learning model achieves reasonable accuracy in BA assessment, with stronger performance in girls compared to boys. However, the relatively wide 95% LoA, particularly for boys, and prediction errors at the extremes of the age range highlight the need for further refinement and validation. While the model shows potential as a supplementary tool for clinicians, future studies should focus on improving prediction accuracy, reducing variability, and validating the model on larger, more diverse datasets before considering widespread clinical implementation. Additionally, addressing edge cases and specific conditions that a human reviewer may detect but the model might overlook, will be essential for enhancing its clinical reliability.

## Full-text entities

- **Diseases:** musculoskeletal disorders (MESH:D009140)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12085795/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12085795/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC12085795/full.md

---
Source: https://tomesphere.com/paper/PMC12085795