Predicting LLM Correctness in Prosthodontics Using Metadata and Hallucination Signals

Lucky Susanto; Anasta Pranawijayana; Cortino Sukotjo; Soni Prasad; Derry Wijaya

arXiv:2512.22508·cs.LG·December 30, 2025

Predicting LLM Correctness in Prosthodontics Using Metadata and Hallucination Signals

Lucky Susanto, Anasta Pranawijayana, Cortino Sukotjo, Soni Prasad, Derry Wijaya

PDF

Open Access

TL;DR

This paper explores predicting the correctness of LLM responses in prosthodontics using metadata and hallucination signals, demonstrating modest accuracy improvements and revealing insights into model behavior and reliability signals.

Contribution

It introduces a novel approach combining metadata and hallucination signals to predict LLM correctness in high-stakes medical domains, highlighting the impact of prompting strategies.

Findings

01

Metadata-based approach improves accuracy by up to 7.14%

02

Achieves 83.12% precision over baseline

03

Hallucination signals are strong indicators of incorrectness

Abstract

Large language models (LLMs) are increasingly adopted in high-stakes domains such as healthcare and medical education, where the risk of generating factually incorrect (i.e., hallucinated) information is a major concern. While significant efforts have been made to detect and mitigate such hallucinations, predicting whether an LLM's response is correct remains a critical yet underexplored problem. This study investigates the feasibility of predicting correctness by analyzing a general-purpose model (GPT-4o) and a reasoning-centric model (OSS-120B) on a multiple-choice prosthodontics exam. We utilize metadata and hallucination signals across three distinct prompting strategies to build a correctness predictor for each (model, prompting) pair. Our findings demonstrate that this metadata-based approach can improve accuracy by up to +7.14% and achieve a precision of 83.12% over a baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling