# A Novel Multimodal Deep Image Analysis Model for Predicting Extraction/Non‐Extraction Decision

**Authors:** Sunna Imtiaz Ahmad, Jakub Olczyk, Adriel S. Araújo, João Pedro de Moura Medeiros, Vinicius C. Teixeira, Carlos F. A. Gomes, Maurício Cecílio Magnaguagno, Quinn Roederer, Vinicius Dutra, R. Scott Conley, Dalvan Griebler, George Eckert, Márcio Sarroglia Pinho, Hakan Turkkahraman

PMC · DOI: 10.1111/ocr.70057 · 2025-11-06

## TL;DR

This paper introduces a deep learning model that helps orthodontists decide whether to extract teeth by analyzing dental scans and X-rays.

## Contribution

A novel multimodal deep learning model combining intraoral scans and cephalometric radiographs for extraction decision support.

## Key findings

- The IOS + Land model achieved the highest accuracy (77%) and F1 score (0.62) for extraction prediction.
- Multimodal models outperformed single-modality models in sensitivity, specificity, and overall accuracy.
- Cephalometric landmark integration significantly improved diagnostic performance compared to autoencoder-based models.

## Abstract

This study aimed to develop a deep learning model classifier capable of predicting the extraction/non‐extraction binary decision using lateral cephalometric radiographs (LCRs) and intraoral scans (IOS) to serve as an additional decision‐support tool for orthodontists.

The dataset was composed of LCRs and IOS from 617 patients (mean age: 18.2, 63.5% female) treated at the Indiana University School of Dentistry. Subjects were categorised into two groups: extraction (192) and non‐extraction (425). Two sets of features were extracted from IOS: traditional arch measurements and novel tooth spatial features. For LCRs, features were derived using CephNet‐based landmark detection (Land), a convolutional autoencoder (AE), and the dimensionality was reduced using Principal Component Analysis (PCA). Models were evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV or precision), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR−), and F1 score.

IOS + Land model achieved the highest overall accuracy (77%) and F1 score (0.62), with strong specificity (83%) and PPV (62%). In contrast, the Land model yielded the highest sensitivity (82%), but at the cost of lower specificity (57%). McNemar's test revealed that the AE model was significantly less accurate than IOS + AE (p = 0.048), IOS + Land (p = 0.006), and IOS + AE + Land (p = 0.005).

Deep learning models can predict the extraction/non‐extraction decision using IOS and LCRs with high accuracy and diagnostic performance. Multimodal approaches, particularly those integrating IOS with cephalometric landmarks, demonstrate superior accuracy, sensitivity, and specificity compared to single‐modality models.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12779187/full.md

---
Source: https://tomesphere.com/paper/PMC12779187