LanFL: Differentially Private Federated Learning with Large Language   Models using Synthetic Samples

Huiyu Wu; Diego Klabjan

arXiv:2410.19114·cs.LG·October 28, 2024

LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples

Huiyu Wu, Diego Klabjan

PDF

Open Access

TL;DR

LanFL introduces a privacy-preserving federated learning approach for large language models that uses synthetic samples and prompt optimization, enabling collaborative learning without sharing sensitive data or model weights.

Contribution

The paper presents LanFL, a novel prompt-based federated learning scheme for LLMs that employs differentially private synthetic data generation and operates as a black-box, addressing computational and privacy challenges.

Findings

01

LanFL effectively enables collaborative learning among participants.

02

The method preserves privacy of local datasets.

03

Experiments show successful learning across various tasks.

Abstract

Federated Learning (FL) is a collaborative, privacy-preserving machine learning framework that enables multiple participants to train a single global model. However, the recent advent of powerful Large Language Models (LLMs) with tens to hundreds of billions of parameters makes the naive application of traditional FL methods to LLMs impractical due to high computational and communication costs. Furthermore, end users of LLMs often lack access to full architectures and weights of the models, making it impossible for participants to fine-tune these models directly. This paper introduces a novel FL scheme for LLMs, named LanFL, which is purely prompt-based and treats the underlying LLMs as black boxes. We have developed a differentially private synthetic sample generation mechanism to facilitate knowledge sharing among participants, along with a prompt optimization scheme that enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data