vTune: Verifiable Fine-Tuning for LLMs Through Backdooring

Eva Zhang; Arka Pal; Akilesh Potti; Micah Goldblum

arXiv:2411.06611·cs.LG·November 13, 2024

vTune: Verifiable Fine-Tuning for LLMs Through Backdooring

Eva Zhang, Arka Pal, Akilesh Potti, Micah Goldblum

PDF

Open Access 4 Reviews

TL;DR

vTune is a verification method for fine-tuned large language models that uses backdoor data points to statistically confirm proper fine-tuning, scalable to state-of-the-art models and resistant to attacks.

Contribution

This paper introduces vTune, a scalable and robust verification technique for LLM fine-tuning using backdoor data points and statistical testing.

Findings

01

Statistical test with p-values around 10^{-40} confirms fine-tuning.

02

No negative impact on downstream task performance.

03

Robustness demonstrated against various attack attempts.

Abstract

As fine-tuning large language models (LLMs) becomes increasingly prevalent, users often rely on third-party services with limited visibility into their fine-tuning processes. This lack of transparency raises the question: how do consumers verify that fine-tuning services are performed correctly? For instance, a service provider could claim to fine-tune a model for each user, yet simply send all users back the same base model. To address this issue, we propose vTune, a simple method that uses a small number of backdoor data points added to the training data to provide a statistical test for verifying that a provider fine-tuned a custom model on a particular user's dataset. Unlike existing works, vTune is able to scale to verification of fine-tuning on state-of-the-art LLMs, and can be used both with open-source and closed-source models. We test our approach across several model families…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

- The paper's writing is clear.

Weaknesses

- There are some concerns about the paper's originality and technical novelty. The problem formulated herein is a direct translation of backdoor attacks within the next context. The specific desiderata listed in Section 2.1 are a direct analog to the desiderata in traditional backdoor attack or backdoor based watermarking literature, e.g., https://arxiv.org/pdf/2003.04247. The paper would benefit from deeper analysis of the unique challenges in this new problem and how the proposed method is des

Reviewer 02Rating 6Confidence 3

Strengths

1. This paper is well-motivated and the problem it aims to address (i.e., verifying whether a service provider fine-tuned a custom model on a downstream dataset provided by users.) is very practical. 2. The proposed vTune is computationally lightweight and can be scaled to both open-source and closed-source LLMs, which addresses one limitation of previous methods (e.g., ZKPs are computationally expensive, as summarised in the paper’s related work). 3. The application of backdoor attacks for ver

Weaknesses

1. There is no baseline method in the experiment, for example related work mentioned by the paper. vTune should be compared with existing methods (i.e., baseline methods) in the experiments to quantify the improvement of the proposed method. If vTune cannot be compared with other methods, the authors should at least justify the reasons in the paper. 2. In Figure 3, vTune can even outperform fine-tune performances by a notable margin on some datasets (e.g., Gemma 2B on SQ, X, MQ), which is count

Reviewer 03Rating 6Confidence 3

Strengths

The article has a clear structure and a novel approach.

Weaknesses

1. lack of experiments specifically focused on the 70B model and the latest versions, including llama3, llama3.1, llama3.2, and GPT-4. 2. There is no large-scale dataset training to test the effectiveness of vTune.

Reviewer 04Rating 3Confidence 4

Strengths

1. Tackling an interesting question. 2. Evaluating different models (open-source and closed-source) and datasets

Weaknesses

Thanks for submitting this paper to ICLR. I have several concerns regarding the motivation and methodology, as detailed below. 1. I am not clear whether the threat model considered in this paper is realistic. Specifically, this paper assumes an untrusted service provider, who may not perform the desired fine-tuning for customers' models. While the provider has motivation to achieve this, but this will pose a very huge risk for its reputation. There is no concrete evidence that any service prov

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · VLSI and Analog Circuit Testing · Real-time simulation and control systems

Methodstravel james · Balanced Selection