Privately Customizing Prefinetuning to Better Match User Data in   Federated Learning

Charlie Hou; Hongyuan Zhan; Akshat Shrivastava; Sid Wang; Aleksandr; Livshits; Giulia Fanti; Daniel Lazar

arXiv:2302.09042·cs.LG·February 24, 2023·1 cites

Privately Customizing Prefinetuning to Better Match User Data in Federated Learning

Charlie Hou, Hongyuan Zhan, Akshat Shrivastava, Sid Wang, Aleksandr, Livshits, Giulia Fanti, Daniel Lazar

PDF

Open Access

TL;DR

This paper introduces FreD, a privacy-preserving method to evaluate and select prefinetuning datasets in federated learning, enhancing model customization to user data while maintaining privacy.

Contribution

The paper proposes FreD, a novel differentially-private Fréchet distance measure for evaluating prefinetuning datasets in federated learning, improving dataset selection accuracy.

Findings

01

FreD accurately predicts the best prefinetuning dataset.

02

FreD operates with minimal privacy cost.

03

The approach enables better dataset customization in federated learning.

Abstract

In Federated Learning (FL), accessing private client data incurs communication and privacy costs. As a result, FL deployments commonly prefinetune pretrained foundation models on a (large, possibly public) dataset that is held by the central server; they then FL-finetune the model on a private, federated dataset held by clients. Evaluating prefinetuning dataset quality reliably and privately is therefore of high importance. To this end, we propose FreD (Federated Private Fr\'echet Distance) -- a privately computed distance between a prefinetuning dataset and federated datasets. Intuitively, it privately computes and compares a Fr\'echet distance between embeddings generated by a large language model on both the central (public) dataset and the federated private client data. To make this computation privacy-preserving, we use distributed, differentially-private mean and covariance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data