Optimizing Canaries for Privacy Auditing with Metagradient Descent

Matteo Boglioni; Terrance Liu; Andrew Ilyas; Zhiwei Steven Wu

arXiv:2507.15836·cs.LG·July 22, 2025

Optimizing Canaries for Privacy Auditing with Metagradient Descent

Matteo Boglioni, Terrance Liu, Andrew Ilyas, Zhiwei Steven Wu

PDF

3 Reviews

TL;DR

This paper introduces a novel method for optimizing canary examples in black-box privacy auditing of differentially private models, significantly improving lower bounds on privacy parameters through metagradient optimization.

Contribution

It proposes a new approach to optimize canary sets for privacy auditing using metagradient descent, enhancing the effectiveness and transferability of privacy bounds in DP models.

Findings

01

Improved empirical lower bounds for DP image classifiers by over 2x.

02

Canaries optimized for small models remain effective for larger models.

03

Method is efficient and transferable across different model sizes.

Abstract

In this work we study black-box privacy auditing, where the goal is to lower bound the privacy parameter of a differentially private learning algorithm using only the algorithm's outputs (i.e., final trained model). For DP-SGD (the most successful method for training differentially private deep learning models), the canonical approach auditing uses membership inference-an auditor comes with a small set of special "canary" examples, inserts a random subset of them into the training set, and then tries to discern which of their canaries were included in the training set (typically via a membership inference attack). The auditor's success rate then provides a lower bound on the privacy parameters of the learning algorithm. Our main contribution is a method for optimizing the auditor's canary set to improve privacy auditing, leveraging recent work on metagradient optimization. Our empirical…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

S1. This paper proposes a novel and effective approach to improve privacy auditing by actively optimizing canary examples. This significantly advances the SOTA in auditing effectiveness, enabling auditors to uncover more accurate privacy leakage estimates. S2. The optimized canaries prove effective for auditing models trained both from scratch and via DP-finetuning, which establishes the method's utility in different practical settings of differentially private model deployment. S3. This paper

Weaknesses

W1. The metagradient optimization process involves training a surrogate model multiple times within each meta-iteration. This paper does not discuss the computational overhead of this optimization phase, which could be a practical barrier for scenario with limited resources. W2. The empirical evaluation is exclusively confined to image classification tasks using CIFAR-10. The generalizability of this canary optimization approach to other data modalities or different machine learning tasks remai

Reviewer 02Rating 6Confidence 4

Strengths

The core idea is original and promising. Although the method is too expensive to use on large architectures, the experiments show that it is sufficient to optimize the canary set with a small architecture and port them over. The writing is very clear and precise.

Weaknesses

I have two main concerns regarding soundness and contribution. First, the claim that the method can be used on a small architecture and generalized to a larger one is critical for the method to be of real utility, and therefore it needs more evidence. Could you experiment with more model sizes, or generalizing to a completely different architecture like ViT? Maybe at least one other data modality, like audio? Second, I think a simple baseline that is important to compare to would be maximizing

Reviewer 03Rating 4Confidence 4

Strengths

- The paper is well-presented and clearly situates itself within the literature of DP (black-box) auditing and canary-based attacks. - The explicit focus on last-iterate black-box auditing is an important and practically relevant problem and the idea of using optimized canaries to audit in this setting is an open problem.

Weaknesses

- Empirical results are limited as only a single dataset (CIFAR10) is used alongside one non-DP experiment and one DP setting (with epsilon=8). It remains unclear how these gains using optimized canaries generalize to other datasets or differing privacy budgets, particularly high-privacy regimes. - The paper claims efficiency of the canary optimization approach (using REPLAY) but does not quantify these overheads empirically or otherwise. - This idea of using optimized canaries seems a straightf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.