Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage
Ziyi He, Yushi Feng, Shuangyu Yang, Yinghao Zhu, Xichen Zhang, Pak Chuen Patrick Tai, Hei Yuet Lo, Songying Wu, Weifa Yang, Lequan Yu

TL;DR
Dental-TriageBench is a new benchmark for multimodal dental triage reasoning, highlighting the challenges and gaps in current models compared to expert clinicians.
Contribution
It introduces the first expert-annotated, multimodal dental triage benchmark with hierarchical labels and reasoning trajectories, enabling better AI system development.
Findings
Models lag behind human experts in fine-grained triage accuracy.
Both complaints and radiographic evidence are essential for accurate triage.
Errors often occur in cases with multiple referral domains, with models producing overly narrow or incomplete referrals.
Abstract
Dental triage is a safety-critical clinical routing task that requires integrating multimodal clinical information (e.g., patient complaints and radiographic evidence) to determine complete referral plans. We present Dental-TriageBench, the first expert-annotated benchmark for reasoning-driven multimodal dental triage. Built from authentic outpatient workflows, it contains 246 de-identified cases annotated with expert-authored golden reasoning trajectories, together with hierarchical triage labels. We benchmark 19 proprietary, open-source, and medical-domain MLLMs against three junior dentists serving as the human baseline, and find a substantial human--model gap, on fine-grained treatment-level triage. Further analyses show that accurate triage requires both complaint and OPG information, and that model errors concentrate on cases with multiple referral domains, where MLLMs tend to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
