Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

Aly M. Kassem; Omar Mahmoud; Niloofar Mireshghallah; Hyunwoo Kim,; Yulia Tsvetkov; Yejin Choi; Sherif Saad; Santu Rana

arXiv:2403.04801·cs.CL·February 11, 2025·1 cites

Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs

Aly M. Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim,, Yulia Tsvetkov, Yejin Choi, Sherif Saad, Santu Rana

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents a black-box prompt optimization method using an attacker LLM to reveal higher levels of memorization in victim models, surpassing traditional prompting approaches and exposing training data leakage.

Contribution

Introduces an iterative rejection-sampling prompt optimization technique to uncover memorization in LLMs, highlighting the effectiveness of instruction-based prompts and automated attack avenues.

Findings

01

Instruction-tuned models can leak training data as much as base models.

02

Contexts beyond training data can cause data leakage.

03

Using other LLMs' instructions enables automated memorization attacks.

Abstract

In this paper, we introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent, compared to what is revealed by prompting the target model with the training data directly, which is the dominant approach of quantifying memorization in LLMs. We use an iterative rejection-sampling optimization process to find instruction-based prompts with two main characteristics: (1) minimal overlap with the training data to avoid presenting the solution directly to the model, and (2) maximal overlap between the victim model's output and the training data, aiming to induce the victim to spit out training data. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that (1) instruction-tuned models can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alymostafa/instruction_based_attack
noneOfficial

Videos

ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs· underline

Taxonomy

TopicsArtificial Intelligence in Law