Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 2030

Mouxiao Bian; Rongzhao Zhang; Chao Ding; Xinwei Peng; Jie Xu

arXiv:2505.07205·cs.CL·May 13, 2025

Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 2030

Mouxiao Bian, Rongzhao Zhang, Chao Ding, Xinwei Peng, Jie Xu

PDF

Open Access

TL;DR

This paper introduces a comprehensive benchmark to evaluate ethical and safety risks of Chinese healthcare LLMs, revealing performance gaps and systemic governance issues, and proposes a practical framework for improved oversight.

Contribution

It presents a large-scale ethical and safety benchmark for Chinese medical LLMs, evaluates current models, and proposes a systemic governance framework for safe deployment.

Findings

01

Baseline accuracy of 42.7% on ethics and safety tasks

02

Fine-tuning improves accuracy to 50.8%

03

Identifies systemic governance gaps in Chinese healthcare AI

Abstract

Large Language Models (LLMs) are poised to transform healthcare under China's Healthy China 2030 initiative, yet they introduce new ethical and patient-safety challenges. We present a novel 12,000-item Q&A benchmark covering 11 ethics and 9 safety dimensions in medical contexts, to quantitatively evaluate these risks. Using this dataset, we assess state-of-the-art Chinese medical LLMs (e.g., Qwen 2.5-32B, DeepSeek), revealing moderate baseline performance (accuracy 42.7% for Qwen 2.5-32B) and significant improvements after fine-tuning on our data (up to 50.8% accuracy). Results show notable gaps in LLM decision-making on ethics and safety scenarios, reflecting insufficient institutional oversight. We then identify systemic governance shortfalls-including the lack of fine-grained ethical audit protocols, slow adaptation by hospital IRBs, and insufficient evaluation tools-that currently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Ethics and Social Impacts of AI