Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking

Chutian Huang; Dake Cao; Jiacheng Ji; Yunlou Fan; Chengze Yan; Hanhui Xu

arXiv:2601.12652·cs.CY·January 21, 2026

Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking

Chutian Huang, Dake Cao, Jiacheng Ji, Yunlou Fan, Chengze Yan, Hanhui Xu

PDF

Open Access

TL;DR

This paper evaluates the vulnerability of seven large language models to medical ethics jailbreak attacks within the Chinese context, revealing high susceptibility and proposing security improvements.

Contribution

It introduces a specialized evaluation framework for Chinese medical ethics jailbreaks and systematically assesses model resilience against sophisticated adversarial prompts.

Findings

01

Jailbreak success rate reached 82.1% across models.

02

Most models failed to resist contextual manipulation in medical ethics.

03

Claude-Sonnet-4-Reasoning was the most robust among tested models.

Abstract

Background: While Large Language Models (LLMs) have achieved widespread adoption, malicious prompt engineering specifically "jailbreak attacks" poses severe security risks by inducing models to bypass internal safety mechanisms. Current benchmarks predominantly focus on public safety and Western cultural norms, leaving a critical gap in evaluating the niche, high-risk domain of medical ethics within the Chinese context. Objective: To establish a specialized jailbreak evaluation framework for Chinese medical ethics and to systematically assess the defensive resilience and ethical alignment of seven prominent LLMs when subjected to sophisticated adversarial simulations. Methodology: We evaluated seven prominent models (e.g., GPT-5, Claude-Sonnet-4-Reasoning, DeepSeek-R1) using a "role-playing + scenario simulation + multi-turn dialogue" vector within the DeepInception framework. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)