Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks
Zhifan Sun, Antonio Valerio Miceli-Barone

TL;DR
This paper investigates how the susceptibility of large language models to prompt injection attacks in machine translation tasks varies with model size, revealing that larger models can sometimes be more vulnerable, especially in multilingual contexts.
Contribution
It introduces a new benchmark dataset for evaluating prompt injection attacks on multilingual LLMs and studies the inverse scaling phenomenon in this setting for the first time.
Findings
Larger models can be more susceptible to prompt injection attacks under certain conditions.
The study reveals inverse scaling behavior in multilingual LLMs regarding attack success rates.
A new benchmark dataset for multilingual prompt injection attacks is proposed.
Abstract
Large Language Models (LLMs) are increasingly becoming the preferred foundation platforms for many Natural Language Processing tasks such as Machine Translation, owing to their quality often comparable to or better than task-specific models, and the simplicity of specifying the task through natural language instructions or in-context examples. Their generality, however, opens them up to subversion by end users who may embed into their requests instructions that cause the model to behave in unauthorized and possibly unsafe ways. In this work we study these Prompt Injection Attacks (PIAs) on multiple families of LLMs on a Machine Translation task, focusing on the effects of model size on the attack success rates. We introduce a new benchmark data set and we discover that on multiple language pairs and injected prompts written in English, larger models under certain conditions may become…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
