Prompt Injection Attacks in Defended Systems
Daniil Khomsky, Narek Maloyan, Bulat Nutfullin

TL;DR
This paper explores black-box attack methods on large language models, evaluates defense mechanisms, and proposes strategies for detecting vulnerabilities and safeguarding NLP systems against malicious exploits.
Contribution
It introduces a methodology for vulnerability detection and defense strategies specifically tailored for black-box attacks on large language models.
Findings
Analysis of existing attack and defense methods
Identification of vulnerabilities in language models
Evaluation of detection algorithms' effectiveness
Abstract
Large language models play a crucial role in modern natural language processing technologies. However, their extensive use also introduces potential security risks, such as the possibility of black-box attacks. These attacks can embed hidden malicious features into the model, leading to adverse consequences during its deployment. This paper investigates methods for black-box attacks on large language models with a three-tiered defense mechanism. It analyzes the challenges and significance of these attacks, highlighting their potential implications for language processing system security. Existing attack and defense methods are examined, evaluating their effectiveness and applicability across various scenarios. Special attention is given to the detection algorithm for black-box attacks, identifying hazardous vulnerabilities in language models and retrieving sensitive information.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Cryptographic Implementations and Security
MethodsSoftmax · Attention Is All You Need
