Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based   Medical Evaluation

Shunfan Zheng; Xiechi Zhang; Gerard de Melo; Xiaoling Wang; Linlin; Wang

arXiv:2501.06741·cs.CL·January 14, 2025

Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation

Shunfan Zheng, Xiechi Zhang, Gerard de Melo, Xiaoling Wang, Linlin, Wang

PDF

1 Video

TL;DR

This paper introduces HDCEval, a hierarchical evaluation framework for medical LLMs that decomposes complex tasks into subtasks evaluated by expert models, improving alignment with human judgment in clinical settings.

Contribution

The paper presents a novel hierarchical evaluation framework with fine-grained medical guidelines and expert model training, enhancing the accuracy of LLM assessments in healthcare.

Findings

01

HDCEval improves alignment with human evaluators in medical assessments.

02

Hierarchical decomposition enhances evaluation precision across multiple medical criteria.

03

Expert model training via Attribute-Driven Token Optimization boosts evaluation reliability.

Abstract

In the rapidly evolving landscape of large language models (LLMs) for medical applications, ensuring the reliability and accuracy of these models in clinical settings is paramount. Existing benchmarks often focus on fixed-format tasks like multiple-choice QA, which fail to capture the complexity of real-world clinical diagnostics. Moreover, traditional evaluation metrics and LLM-based evaluators struggle with misalignment, often providing oversimplified assessments that do not adequately reflect human judgment. To address these challenges, we introduce HDCEval, a Hierarchical Divide-and-Conquer Evaluation framework tailored for fine-grained alignment in medical evaluation. HDCEval is built on a set of fine-grained medical evaluation guidelines developed in collaboration with professional doctors, encompassing Patient Question Relevance, Medical Knowledge Correctness, and Expression. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation· underline

Taxonomy

MethodsSparse Evolutionary Training · Focus