MedEthicsQA: A Comprehensive Question Answering Benchmark for Medical Ethics Evaluation of LLMs
Jianhui Wei, Zijie Meng, Zikai Xiao, Tianxiang Hu, Yang Feng, Zhijie Zhou, Jian Wu, Zuozhu Liu

TL;DR
MedEthicsQA is a new comprehensive benchmark with thousands of questions designed to evaluate the ethical reasoning of medical large language models, revealing their current limitations in medical ethics understanding.
Contribution
This paper introduces MedEthicsQA, the first large-scale, hierarchically structured benchmark for assessing medical ethics in LLMs, combining multiple datasets and expert validation.
Findings
State-of-the-art MedLLMs perform poorly on ethics questions.
The benchmark reveals significant gaps in medical ethics alignment.
High-quality, validated dataset with low error rate.
Abstract
While Medical Large Language Models (MedLLMs) have demonstrated remarkable potential in clinical tasks, their ethical safety remains insufficiently explored. This paper introduces , a comprehensive benchmark comprising multiple-choice questions and open-ended questions for evaluation of medical ethics in LLMs. We systematically establish a hierarchical taxonomy integrating global medical ethical standards. The benchmark encompasses widely used medical datasets, authoritative question banks, and scenarios derived from PubMed literature. Rigorous quality control involving multi-stage filtering and multi-faceted expert validation ensures the reliability of the dataset with a low error rate (). Evaluation of state-of-the-art MedLLMs exhibit declined performance in answering medical ethics questions compared to their foundation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education
