One-shot Empirical Privacy Estimation for Federated Learning

Galen Andrew; Peter Kairouz; Sewoong Oh; Alina Oprea; H. Brendan; McMahan; Vinith M. Suriyakumar

arXiv:2302.03098·cs.LG·April 19, 2024·6 cites

One-shot Empirical Privacy Estimation for Federated Learning

Galen Andrew, Peter Kairouz, Sewoong Oh, Alina Oprea, H. Brendan, McMahan, Vinith M. Suriyakumar

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a novel one-shot method for empirically estimating privacy loss in federated learning models, enabling efficient privacy auditing during training without extensive retraining or prior knowledge.

Contribution

The work presents a scalable, provably correct one-shot privacy estimation technique applicable during a single training run in federated learning, overcoming limitations of previous methods.

Findings

01

Provides provably correct privacy estimates under Gaussian mechanism

02

Demonstrates effectiveness on federated learning benchmarks

03

Works across various adversarial threat models

Abstract

Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight. However, existing privacy auditing techniques usually make strong assumptions on the adversary (e.g., knowledge of intermediate model iterates or the training data distribution), are tailored to specific tasks, model architectures, or DP algorithm, and/or require retraining the model many times (typically on the order of thousands). These shortcomings make deploying such techniques at scale difficult in practice, especially in federated settings where model training can take days or weeks. In this work, we present a novel "one-shot" approach that can systematically address these challenges, allowing efficient auditing or estimation of the privacy loss of a model during…

Peer Reviews

Decision·ICLR 2024 oral

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

The introduction clearly introduces the context of federated learning and motivates the need to develop empirical estimation methods to be able to audit the privacy provided by a differentially-private learning algorithm. The main challenges that need to be address to realize this are also clearly discussed. Overall, the paper is well-written and easy to follow. The proposed approach has clear benefits over previous approaches, in the sense that it does not require retraining of the system or

Weaknesses

The example analyzed in Table 1 assumes a high dimension as well as a large number of canaries, which is not particularly realistic. The authors should provide similar analysis for lower values of d and k. Similarly, in the experiments conducted the number of canaries used is quite large, which is likely to have an impact on the model utility. Thus, the authors should also reports the accuracy obtained for the model. In contrast, the values of epsilon used are very high and additional experiment

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

- The paper is well-written and easy to follow. Introduction and motivation of the method is clearly communicated. - Consideration of estimation of $\epsilon$ for Gaussian mechanism is spot on, since in that case authors are able to prove that the estimation becomes correct asymptotically (in d). - Experiments with multiple datasets and architectures are presented.

Weaknesses

- Using canary clients seems more inefficient compared to using canary examples. More resource allocation might be needed. - In the paper it is suggested that "In production settings, a simple and effective strategy would be to designate a small fraction of real clients to have their model updates replaced with the canary update whenever they participate." but this would destroy the representation of such clients and problematic, especially, in data heterogenous settings. It would also result

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

The paper is very well-written and easy to understand. The proposed method fairly simple, so it should be easy to implement, and potentially to extend to other settings. The main idea of randomly sampling canaries that are orthogonal with everything else with high probability is novel to my knowledge. Being able to estimate the privacy loss in a single training run, with minimal effect on the target model's accuracy, makes the method fairly practical.

Weaknesses

The analytical privacy bound doesn't seem like a good "ground truth" for the comparison with CANIFE, as the analytical bound is expected to be much larger than necessary. The best $\epsilon$ to return would be the upper bound that any membership inference attack could achieve, which is of course not known. It is possible that your method is simply overestimating the best $\epsilon$, and is closer to the analytical because of that, and not because it is better at estimating the best $\epsilon$ th

Code & Models

Repositories

google-research/federated
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data