# Comparative Evaluation of Vision–Language Models for Detecting and Localizing Dental Lesions from Intraoral Images

**Authors:** Maria Jahan, Al Ibne Siam, Lamim Zakir Pronay, Saif Ahmed, Nabeel Mohammed, James Dudley, Taseef Hasan Farook

PMC · DOI: 10.3390/jimaging12010022 · Journal of Imaging · 2026-01-03

## TL;DR

This paper compares vision–language models and finds that YOLOv8 performs best at detecting dental lesions in intraoral images.

## Contribution

The study evaluates vision–language models for dental lesion detection and highlights the need for better datasets and hybrid models.

## Key findings

- YOLOv8 outperformed Florence-2 and PaLI-Gemma in mean average precision and lesion detection.
- Florence-2 and PaLI-Gemma showed lower recall and precision despite multimodal capabilities.
- Larger and diverse datasets are needed to improve model performance in dental imaging.

## Abstract

To assess the efficiency of vision–language models in detecting and classifying carious and non-carious lesions from intraoral photo imaging. A dataset of 172 annotated images were classified for microcavitation, cavitated lesions, staining, calculus, and non-carious lesions. Florence-2, PaLI-Gemma, and YOLOv8 models were trained on the dataset and model performance. The dataset was divided into 80:10:10 split, and the model performance was evaluated using mean average precision (mAP), mAP50-95, class-specific precision and recall. YOLOv8 outperformed the vision–language models, achieving a mean average precision (mAP) of 37% with a precision of 42.3% (with 100% for cavitation detection) and 31.3% recall. PaLI-Gemma produced a recall of 13% and 21%. Florence-2 yielded a mean average precision of 10% with a precision and recall was 51% and 35%. YOLOv8 achieved the strongest overall performance. Florence-2 and PaLI-Gemma models underperformed relative to YOLOv8 despite the potential for multimodal contextual understanding, highlighting the need for larger, more diverse datasets and hybrid architectures to achieve improved performance.

## Full-text entities

- **Diseases:** Dental Lesions (MESH:D009057), lesions (MESH:D009059)
- **Chemicals:** Gemma (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12842643/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12842643/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12842643/full.md

---
Source: https://tomesphere.com/paper/PMC12842643