Privately Customizing Prefinetuning to Better Match User Data in Federated Learning
Charlie Hou, Hongyuan Zhan, Akshat Shrivastava, Sid Wang, Aleksandr, Livshits, Giulia Fanti, Daniel Lazar

TL;DR
This paper introduces FreD, a privacy-preserving method to evaluate and select prefinetuning datasets in federated learning, enhancing model customization to user data while maintaining privacy.
Contribution
The paper proposes FreD, a novel differentially-private Fréchet distance measure for evaluating prefinetuning datasets in federated learning, improving dataset selection accuracy.
Findings
FreD accurately predicts the best prefinetuning dataset.
FreD operates with minimal privacy cost.
The approach enables better dataset customization in federated learning.
Abstract
In Federated Learning (FL), accessing private client data incurs communication and privacy costs. As a result, FL deployments commonly prefinetune pretrained foundation models on a (large, possibly public) dataset that is held by the central server; they then FL-finetune the model on a private, federated dataset held by clients. Evaluating prefinetuning dataset quality reliably and privately is therefore of high importance. To this end, we propose FreD (Federated Private Fr\'echet Distance) -- a privately computed distance between a prefinetuning dataset and federated datasets. Intuitively, it privately computes and compares a Fr\'echet distance between embeddings generated by a large language model on both the central (public) dataset and the federated private client data. To make this computation privacy-preserving, we use distributed, differentially-private mean and covariance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
