ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary   LLMs on Private Datasets

Ahmed Frikha; Nassim Walha; Ricardo Mendes; Krishna Kanth Nakka; Xue; Jiang; Xuebing Zhou

arXiv:2407.02960·cs.CR·January 14, 2025

ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

Ahmed Frikha, Nassim Walha, Ricardo Mendes, Krishna Kanth Nakka, Xue, Jiang, Xuebing Zhou

PDF

Open Access

TL;DR

ObfuscaTune enables secure offsite fine-tuning and inference of proprietary large language models on private data, ensuring confidentiality of both model and data through obfuscation and confidential computing techniques.

Contribution

The paper introduces ObfuscaTune, a novel method combining obfuscation and confidential computing to enable privacy-preserving offsite LLM fine-tuning and inference.

Findings

01

Effective on GPT-2 models across multiple datasets

02

Reduces model parameter exposure to 5% using TEE

03

Obfuscation with low condition number matrices minimizes errors

Abstract

This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provider. We tackle this problem by proposing ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only 5% of the model parameters are placed on TEE). We empirically demonstrate the effectiveness of ObfuscaTune by validating it on GPT-2 models with different sizes on four NLP benchmark datasets. Finally, we compare to a na\"ive version of our approach to highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Discriminative Fine-Tuning · Residual Connection · Multi-Head Attention · Softmax · Layer Normalization