MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs

Jianhui Wei; Zijie Meng; Zikai Xiao; Tianxiang Hu; Yang Feng; Zhijie Zhou; Jian Wu; Zuozhu Liu

arXiv:2506.22808·cs.CL·July 1, 2025

MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs

Jianhui Wei, Zijie Meng, Zikai Xiao, Tianxiang Hu, Yang Feng, Zhijie Zhou, Jian Wu, Zuozhu Liu

PDF

Open Access

TL;DR

MedEthicsQA is a new comprehensive benchmark with thousands of questions designed to evaluate the ethical reasoning of medical large language models, revealing their current limitations in medical ethics understanding.

Contribution

This paper introduces MedEthicsQA, the first large-scale, hierarchically structured benchmark for assessing medical ethics in LLMs, combining multiple datasets and expert validation.

Findings

01

State-of-the-art MedLLMs perform poorly on ethics questions.

02

The benchmark reveals significant gaps in medical ethics alignment.

03

High-quality, validated dataset with low error rate.

Abstract

While Medical Large Language Models (MedLLMs) have demonstrated remarkable potential in clinical tasks, their ethical safety remains insufficiently explored. This paper introduces $MedEthicsQA$ , a comprehensive benchmark comprising $5,623$ multiple-choice questions and $5,351$ open-ended questions for evaluation of medical ethics in LLMs. We systematically establish a hierarchical taxonomy integrating global medical ethical standards. The benchmark encompasses widely used medical datasets, authoritative question banks, and scenarios derived from PubMed literature. Rigorous quality control involving multi-stage filtering and multi-faceted expert validation ensures the reliability of the dataset with a low error rate ( $2.72%$ ). Evaluation of state-of-the-art MedLLMs exhibit declined performance in answering medical ethics questions compared to their foundation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education