Bridging the Knowledge-Action Gap by Evaluating LLMs in Dynamic Dental Clinical Scenarios

Hongyang Ma; Tiantian Gu; Huaiyuan Sun; Huilin Zhu; Yongxin Wang; Jie Li; Wubin Sun; Zeliang Lian; Yinghong Zhou; Yi Gao; Shirui Wang; Zhihui Tang

arXiv:2601.12974·cs.CL·January 21, 2026

Bridging the Knowledge-Action Gap by Evaluating LLMs in Dynamic Dental Clinical Scenarios

Hongyang Ma, Tiantian Gu, Huaiyuan Sun, Huilin Zhu, Yongxin Wang, Jie Li, Wubin Sun, Zeliang Lian, Yinghong Zhou, Yi Gao, Shirui Wang, Zhihui Tang

PDF

Open Access

TL;DR

This paper evaluates large language models in dynamic dental clinical scenarios, revealing their strengths in static tasks but challenges in active information gathering and decision-making, emphasizing the need for domain-adaptive training.

Contribution

Introduces the SCMPE benchmark for assessing LLMs in dynamic dental clinical workflows and analyzes their performance limitations and the impact of retrieval-augmented generation.

Findings

01

Models excel in static tasks but struggle with dynamic dialogues.

02

Retrieval-augmented generation reduces hallucinations in static tasks.

03

External knowledge alone is insufficient for safe autonomous clinical decision-making.

Abstract

The transition of Large Language Models (LLMs) from passive knowledge retrievers to autonomous clinical agents demands a shift in evaluation-from static accuracy to dynamic behavioral reliability. To explore this boundary in dentistry, a domain where high-quality AI advice uniquely empowers patient-participatory decision-making, we present the Standardized Clinical Management & Performance Evaluation (SCMPE) benchmark, which comprehensively assesses performance from knowledge-oriented evaluations (static objective tasks) to workflow-based simulations (multi-turn simulated patient interactions). Our analysis reveals that while models demonstrate high proficiency in static objective tasks, their performance precipitates in dynamic clinical dialogues, identifying that the primary bottleneck lies not in knowledge retention, but in the critical challenges of active information gathering and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills · Explainable Artificial Intelligence (XAI)