Scaling Behavior of Machine Translation with Large Language Models under   Prompt Injection Attacks

Zhifan Sun; Antonio Valerio Miceli-Barone

arXiv:2403.09832·cs.CL·March 18, 2024·1 cites

Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks

Zhifan Sun, Antonio Valerio Miceli-Barone

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the susceptibility of large language models to prompt injection attacks in machine translation tasks varies with model size, revealing that larger models can sometimes be more vulnerable, especially in multilingual contexts.

Contribution

It introduces a new benchmark dataset for evaluating prompt injection attacks on multilingual LLMs and studies the inverse scaling phenomenon in this setting for the first time.

Findings

01

Larger models can be more susceptible to prompt injection attacks under certain conditions.

02

The study reveals inverse scaling behavior in multilingual LLMs regarding attack success rates.

03

A new benchmark dataset for multilingual prompt injection attacks is proposed.

Abstract

Large Language Models (LLMs) are increasingly becoming the preferred foundation platforms for many Natural Language Processing tasks such as Machine Translation, owing to their quality often comparable to or better than task-specific models, and the simplicity of specifying the task through natural language instructions or in-context examples. Their generality, however, opens them up to subversion by end users who may embed into their requests instructions that cause the model to behave in unauthorized and possibly unsafe ways. In this work we study these Prompt Injection Attacks (PIAs) on multiple families of LLMs on a Machine Translation task, focusing on the effects of model size on the attack success rates. We introduce a new benchmark data set and we discover that on multiple language pairs and injected prompts written in English, larger models under certain conditions may become…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avmb/mt_scaling_prompt_injection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training