A benchmark multimodal oro-dental dataset for large vision-language models

Haoxin Lv; Ijazul Haq; Jin Du; Jiaxin Ma; Binnian Zhu; Xiaobing Dang; Chaoan Liang; Ruxu Du; Yingjie Zhang; Muhammad Saqib

arXiv:2511.04948·cs.CV·November 10, 2025

A benchmark multimodal oro-dental dataset for large vision-language models

Haoxin Lv, Ijazul Haq, Jin Du, Jiaxin Ma, Binnian Zhu, Xiaobing Dang, Chaoan Liang, Ruxu Du, Yingjie Zhang, Muhammad Saqib

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a large, annotated multimodal oro-dental dataset with images, radiographs, and textual records, and demonstrates its effectiveness by fine-tuning vision-language models for dental diagnosis and report generation.

Contribution

It provides a comprehensive, publicly available dataset for AI in dentistry and shows how fine-tuning large models improves diagnostic and reporting tasks.

Findings

01

Fine-tuned models outperform baselines in anomaly classification.

02

Models achieve significant improvements in diagnostic report generation.

03

Dataset enables effective training of vision-language models for dental applications.

Abstract

The advancement of artificial intelligence in oral healthcare relies on the availability of large-scale multimodal datasets that capture the complexity of clinical practice. In this paper, we present a comprehensive multimodal dataset, comprising 8775 dental checkups from 4800 patients collected over eight years (2018-2025), with patients ranging from 10 to 90 years of age. The dataset includes 50000 intraoral images, 8056 radiographs, and detailed textual records, including diagnoses, treatment plans, and follow-up notes. The data were collected under standard ethical guidelines and annotated for benchmarking. To demonstrate its utility, we fine-tuned state-of-the-art large vision-language models, Qwen-VL 3B and 7B, and evaluated them on two tasks: classification of six oro-dental anomalies and generation of complete diagnostic reports from multimodal inputs. We compared the fine-tuned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zirak-ai/COde
dataset· 50 dl
50 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDental Radiography and Imaging · Dental Research and COVID-19 · COVID-19 diagnosis using AI