Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Jingyuan Xie; Wenjie Wang; Ji Wu; Jiandong Gao

arXiv:2603.02262·cs.CR·March 4, 2026

Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Jingyuan Xie, Wenjie Wang, Ji Wu, Jiandong Gao

PDF

Open Access

TL;DR

This paper introduces a stealthy poisoning attack during the fine-tuning of medical LLMs by injecting poisoned rationales, significantly degrading model performance on targeted medical topics without detection.

Contribution

It presents a novel rationale poisoning method for medical LLMs during supervised fine-tuning, highlighting a new security risk and the need for defense strategies.

Findings

01

Poisoned rationales cause significant accuracy decline on target topics.

02

Knowledge overwriting is ineffective against rationale poisoning.

03

A minimal number of poisoned samples can effectively degrade model performance.

Abstract

Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly focused on the detectable backdoor attacks. We propose a novel poisoning attack targeting the reasoning process of medical LLMs during SFT. Unlike backdoor attacks, our method injects poisoned rationales into few-shot training data, leading to stealthy degradation of model performance on targeted medical topics. Results showed that knowledge overwriting was ineffective, while rationale poisoning caused significant decline on the accuracy of the target subject, as long as no correct samples of the same subject appear in the dataset. A minimum number and ratio of poisoned samples was needed to carry out an effective and stealthy attack, which was more efficient and accurate than catastrophic forgetting. We demonstrate though this study the risk of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education