Open LLMs are Necessary for Current Private Adaptations and Outperform   their Closed Alternatives

Vincent Hanke; Tom Blanchard; Franziska Boenisch; Iyiola Emmanuel; Olatunji; Michael Backes; Adam Dziedzic

arXiv:2411.05818·cs.LG·November 18, 2024·3 cites

Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives

Vincent Hanke, Tom Blanchard, Franziska Boenisch, Iyiola Emmanuel, Olatunji, Michael Backes, Adam Dziedzic

PDF

Open Access

TL;DR

Open LLMs are essential for private data adaptation, outperforming closed models in privacy, cost, and performance, as current methods for closed LLMs leak private data and are less effective.

Contribution

This paper provides a comprehensive analysis of recent private adaptation methods for closed LLMs, highlighting their privacy leaks, performance limitations, and cost issues, advocating for open LLMs.

Findings

01

All methods leak query data to LLM providers.

02

Most methods leak significant private training data.

03

Open LLMs outperform closed LLM adaptation methods in privacy and cost.

Abstract

While open Large Language Models (LLMs) have made significant progress, they still fall short of matching the performance of their closed, proprietary counterparts, making the latter attractive even for the use on highly private data. Recently, various new methods have been proposed to adapt closed LLMs to private data without leaking private information to third parties and/or the LLM provider. In this work, we analyze the privacy protection and performance of the four most recent methods for private adaptation of closed LLMs. By examining their threat models and thoroughly comparing their performance under different privacy levels according to differential privacy (DP), various LLM architectures, and multiple datasets for classification and generation tasks, we find that: (1) all the methods leak query data, i.e., the (potentially sensitive) user data that is queried at inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Law, AI, and Intellectual Property · Library Science and Information Systems