Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking
Chutian Huang, Dake Cao, Jiacheng Ji, Yunlou Fan, Chengze Yan, Hanhui Xu

TL;DR
This paper evaluates the vulnerability of seven large language models to medical ethics jailbreak attacks within the Chinese context, revealing high susceptibility and proposing security improvements.
Contribution
It introduces a specialized evaluation framework for Chinese medical ethics jailbreaks and systematically assesses model resilience against sophisticated adversarial prompts.
Findings
Jailbreak success rate reached 82.1% across models.
Most models failed to resist contextual manipulation in medical ethics.
Claude-Sonnet-4-Reasoning was the most robust among tested models.
Abstract
Background: While Large Language Models (LLMs) have achieved widespread adoption, malicious prompt engineering specifically "jailbreak attacks" poses severe security risks by inducing models to bypass internal safety mechanisms. Current benchmarks predominantly focus on public safety and Western cultural norms, leaving a critical gap in evaluating the niche, high-risk domain of medical ethics within the Chinese context. Objective: To establish a specialized jailbreak evaluation framework for Chinese medical ethics and to systematically assess the defensive resilience and ethical alignment of seven prominent LLMs when subjected to sophisticated adversarial simulations. Methodology: We evaluated seven prominent models (e.g., GPT-5, Claude-Sonnet-4-Reasoning, DeepSeek-R1) using a "role-playing + scenario simulation + multi-turn dialogue" vector within the DeepInception framework. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
