# Diagnostic accuracy of artificial intelligence models for temporomandibular joint anomalies on MRI: a systematic review and meta-analysis

**Authors:** Abhimanyu Pradhan, Aakash Panda, Rajagopal Kadavigere, Neil Abraham Barnes, Suresh Sukumar, Ashwin Prabhu, Dilip Shettigar, Winniecia Dkhar

PMC · DOI: 10.1186/s12938-026-01525-6 · BioMedical Engineering OnLine · 2026-01-31

## TL;DR

This study reviews AI models for detecting jaw joint issues on MRI scans and finds that advanced deep learning models perform better but need more validation.

## Contribution

The study provides a meta-analysis of AI diagnostic accuracy for TMJ anomalies on MRI and identifies high-performing model architectures.

## Key findings

- Pooled diagnostic accuracy of AI models for TMJ anomalies was 0.487 with high heterogeneity.
- ResNet-18, Inception v3, and EfficientNet-b4 showed higher and more consistent diagnostic performance.
- Limited external validation and high heterogeneity hinder clinical translation of AI models.

## Abstract

Artificial intelligence (AI) techniques are increasingly applied to magnetic resonance imaging (MRI) for detecting temporomandibular joint (TMJ) anomalies; however, their overall diagnostic accuracy and generalizability remain uncertain.

To systematically review and meta-analyse the diagnostic performance of AI models for TMJ anomaly detection on MRI and to identify factors influencing model performance.

A comprehensive search of PubMed, Scopus, Embase, and Web of Science was conducted for studies published between January 2015 and September 2025. Two reviewers independently screened and extracted data. Eligible studies developed and tested AI, machine learning, or deep learning models on human TMJ MRI and reported quantitative performance metrics. Risk of bias was assessed using the QUADAS-2 tool. Pooled sensitivity and specificity were estimated using a bivariate random-effects model, while pooled accuracy was derived using logit transformation. Heterogeneity (I2) was explored through subgroup analyses by model architecture and validation strategy.

Fourteen studies were included in the systematic review, of which six met the criteria for meta-analysis. Across these six studies, 18 models were analyzed for accuracy, 29 for sensitivity, and 24 for specificity. The pooled diagnostic accuracy was 0.487 (95% CI 0.403–0.571), with pooled sensitivity and specificity of 0.399 (95% CI 0.348–0.450) and 0.399 (95% CI 0.343–0.456), respectively, all showing substantial heterogeneity (I2 > 90%). Subgroup analyses indicated that advanced architectures such as ResNet-18, Inception v3, and EfficientNet-b4 achieved higher and more consistent diagnostic performance.

Advanced deep learning architectures such as ResNet-18, Inception v3, and EfficientNet-b4 demonstrated superior diagnostic performance for detecting temporomandibular joint anomalies on MRI. These findings highlight the potential of AI-assisted MRI interpretation to improve diagnostic consistency, efficiency, and early detection of TMJ pathology. However, substantial heterogeneity and limited external validation currently limit clinical translation. Standardized multicenter studies and transparent model validation are essential to ensure reliable integration of AI tools into clinical TMJ imaging workflows.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Diseases:** TMJ anomaly (MESH:D013706)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12952169/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12952169/full.md

---
Source: https://tomesphere.com/paper/PMC12952169