Dataset Size Recovery from LoRA Weights

Mohammad Salama; Jonathan Kahana; Eliahu Horwitz; Yedid; Hoshen

arXiv:2406.19395·cs.CV·June 28, 2024

Dataset Size Recovery from LoRA Weights

Mohammad Salama, Jonathan Kahana, Eliahu Horwitz, Yedid, Hoshen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a method to determine the number of training samples used in fine-tuning models with LoRA by analyzing the weights, revealing a new vulnerability in model privacy.

Contribution

The paper proposes DSiRe, a novel approach to recover dataset size from LoRA weights, and provides a new benchmark, LoRA-WiSE, for evaluating such recovery methods.

Findings

01

LoRA weight spectrum correlates with dataset size

02

DSiRe predicts dataset size with MAE of 0.36 images

03

New benchmark LoRA-WiSE enables evaluation of dataset size recovery

Abstract

Model inversion and membership inference attacks aim to reconstruct and verify the data which a model was trained on. However, they are not guaranteed to find all training samples as they do not know the size of the training set. In this paper, we introduce a new task: dataset size recovery, that aims to determine the number of samples used to train a model, directly from its weights. We then propose DSiRe, a method for recovering the number of images used to fine-tune a model, in the common case where fine-tuning uses LoRA. We discover that both the norm and the spectrum of the LoRA matrices are closely linked to the fine-tuning dataset size; we leverage this finding to propose a simple yet effective prediction algorithm. To evaluate dataset size recovery of LoRA weights, we develop and release a new benchmark, LoRA-WiSE, consisting of over 25000 weight snapshots from more than 2000…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MoSalama98/DSiRe
pytorchOfficial

Datasets

MoSalama98/LoRA-WiSE
dataset· 63 dl
63 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications · Computational Physics and Python Applications

MethodsDataset Size Recovery