F2A: An Innovative Approach for Prompt Injection by Utilizing Feign   Security Detection Agents

Yupeng Ren

arXiv:2410.08776·cs.CR·October 15, 2024

F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents

Yupeng Ren

PDF

Open Access

TL;DR

This paper introduces F2A, a novel attack exploiting LLMs' blind trust in safety detection agents, demonstrating how malicious fake results can hijack conversations and proposing solutions to enhance LLM security.

Contribution

The paper presents the Feign Agent Attack (F2A), revealing a new vulnerability in LLM safety mechanisms and offering strategies to mitigate this security risk.

Findings

01

F2A can successfully hijack LLM conversations using fake safety results

02

LLMs tend to trust safety detection outputs without critical evaluation

03

Proposed solutions improve LLM robustness against F2A attacks

Abstract

With the rapid development of Large Language Models (LLMs), numerous mature applications of LLMs have emerged in the field of content safety detection. However, we have found that LLMs exhibit blind trust in safety detection agents. The general LLMs can be compromised by hackers with this vulnerability. Hence, this paper proposed an attack named Feign Agent Attack (F2A).Through such malicious forgery methods, adding fake safety detection results into the prompt, the defense mechanism of LLMs can be bypassed, thereby obtaining harmful content and hijacking the normal conversation. Continually, a series of experiments were conducted. In these experiments, the hijacking capability of F2A on LLMs was analyzed and demonstrated, exploring the fundamental reasons why LLMs blindly trust safety detection results. The experiments involved various scenarios where fake safety detection results were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection