Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks

Saisai Hu

arXiv:2605.08257·cs.CR·May 12, 2026

Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks

Saisai Hu

PDF

TL;DR

This paper introduces a comprehensive security framework for medical decision-making AI agents, significantly improving adversarial robustness and trustworthiness under various attack scenarios.

Contribution

It proposes the ARSM-Agent with a weighted joint objective and multi-module collaboration, enhancing security and robustness over existing methods.

Findings

01

ARSM-Agent reduces attack success rate to 8.7% under various attacks.

02

The approach achieves a knowledge consistency score of 0.91.

03

Ablation studies show each module's importance in security and accuracy.

Abstract

Motivated by the challenge to improve the adversarial robustness, security, and trust of medical decision making intelligent agents, this study develops a full-link security enhancement framework, which describes "input risk perception - medical evidence constraint - knowledge consistency verification - decision confidence reweighting - security output control - adversarial feedback update." We propose ARSM-Agent and define a weighted joint objective consisting of decision accuracy loss, adversarial robustness loss, safety refusal loss, and knowledge consistency loss, with weights of 0.3, 0.3, 0.2, and 0.2, respectively. The whole medical decision formulation is implemented by multi-module collaborative linkage. We verify that the algorithm is more efficient than four baselines, including LLM-Agent, Retrieval-Agent, Filter-Agent, and Adv-Train-Agent. Under semantic perturbation, prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.