LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic

Weibing Zheng; Laurah Turner; Jess Kropczynski; Murat Ozer; Tri Nguyen; and Shane Halse

arXiv:2506.11221·cs.AI·June 16, 2025

LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic

Weibing Zheng, Laurah Turner, Jess Kropczynski, Murat Ozer, Tri Nguyen, and Shane Halse

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach combining fuzzy logic and large language models to automate and align clinical skill assessments with physician judgment, achieving over 80% accuracy in medical education evaluations.

Contribution

It presents a fine-tuning method for LLMs using fuzzy logic-based human annotations to improve automated clinical evaluations in medical training.

Findings

01

Achieved over 80% accuracy in evaluation tasks

02

Major criteria items scored over 90% accuracy

03

Demonstrated effective alignment with human judgment

Abstract

Clinical communication skills are critical in medical education, and practicing and assessing clinical communication skills on a scale is challenging. Although LLM-powered clinical scenario simulations have shown promise in enhancing medical students' clinical practice, providing automated and scalable clinical evaluation that follows nuanced physician judgment is difficult. This paper combines fuzzy logic and Large Language Model (LLM) and proposes LLM-as-a-Fuzzy-Judge to address the challenge of aligning the automated evaluation of medical students' clinical skills with subjective physicians' preferences. LLM-as-a-Fuzzy-Judge is an approach that LLM is fine-tuned to evaluate medical students' utterances within student-AI patient conversation scripts based on human annotations from four fuzzy sets, including Professionalism, Medical Relevance, Ethical Behavior, and Contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

2sigmaedtech/llmasajudge
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques

MethodsALIGN