Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface

Andrey Labunets; Nishit V. Pandya; Ashish Hooda; Xiaohan Fu; Earlence Fernandes

arXiv:2501.09798·cs.CR·May 13, 2025

Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface

Andrey Labunets, Nishit V. Pandya, Ashish Hooda, Xiaohan Fu, Earlence Fernandes

PDF

Open Access

TL;DR

This paper reveals a new vulnerability in proprietary LLMs where attackers can use the fine-tuning interface's loss-like information to optimize prompt injections, leading to high success attack rates.

Contribution

It introduces a novel attack method exploiting the fine-tuning interface of LLMs, demonstrating significant security risks and highlighting the utility-security tradeoff.

Findings

01

Attack success rates between 65% and 82% on Gemini LLMs.

02

Loss-like signals from fine-tuning APIs can guide adversarial prompt optimization.

03

Fine-tuning interfaces expose vulnerabilities despite their utility.

Abstract

We surface a new threat to closed-weight Large Language Models (LLMs) that enables an attacker to compute optimization-based prompt injections. Specifically, we characterize how an attacker can leverage the loss-like information returned from the remote fine-tuning interface to guide the search for adversarial prompts. The fine-tuning interface is hosted by an LLM vendor and allows developers to fine-tune LLMs for their tasks, thus providing utility, but also exposes enough information for an attacker to compute adversarial prompts. Through an experimental analysis, we characterize the loss-like values returned by the Gemini fine-tuning API and demonstrate that they provide a useful signal for discrete optimization of adversarial prompts using a greedy search algorithm. Using the PurpleLlama prompt injection benchmark, we demonstrate attack success rates between 65% and 82% on Google's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Low-power high-performance VLSI design