Prompt Injection Attacks in Defended Systems

Daniil Khomsky; Narek Maloyan; Bulat Nutfullin

arXiv:2406.14048·cs.CL·February 25, 2025·1 cites

Prompt Injection Attacks in Defended Systems

Daniil Khomsky, Narek Maloyan, Bulat Nutfullin

PDF

Open Access

TL;DR

This paper explores black-box attack methods on large language models, evaluates defense mechanisms, and proposes strategies for detecting vulnerabilities and safeguarding NLP systems against malicious exploits.

Contribution

It introduces a methodology for vulnerability detection and defense strategies specifically tailored for black-box attacks on large language models.

Findings

01

Analysis of existing attack and defense methods

02

Identification of vulnerabilities in language models

03

Evaluation of detection algorithms' effectiveness

Abstract

Large language models play a crucial role in modern natural language processing technologies. However, their extensive use also introduces potential security risks, such as the possibility of black-box attacks. These attacks can embed hidden malicious features into the model, leading to adverse consequences during its deployment. This paper investigates methods for black-box attacks on large language models with a three-tiered defense mechanism. It analyzes the challenges and significance of these attacks, highlighting their potential implications for language processing system security. Existing attack and defense methods are examined, evaluating their effectiveness and applicability across various scenarios. Special attention is given to the detection algorithm for black-box attacks, identifying hazardous vulnerabilities in language models and retrieving sensitive information.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Cryptographic Implementations and Security

MethodsSoftmax · Attention Is All You Need