Differentially Private Auditing Under Strategic Response
Florian A. D. Burnat

TL;DR
This paper models privacy-constrained AI audits as a strategic game, analyzing optimal audit policies considering developer responses to improve detection of privacy violations.
Contribution
It introduces a game-theoretic framework for privacy audits, characterizes optimal audit strategies, and proposes a gradient-based algorithm for strategic audit design.
Findings
Naive DP auditing leads to larger under-detection gaps in heterogeneous settings.
Optimal audit allocation balances welfare, detectability, and mitigation costs.
The proposed SPAD algorithm effectively computes strategic audit policies.
Abstract
Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the welfare-weighted under-detection gap , the welfare-weighted true residual harm the audit fails to detect at the developer's strategic best response, and prove that naive DP auditing (uniform or harm-proportional allocation) induces a strictly larger than any non-strategic mitigation baseline whenever effective detectability is heterogeneous, the welfare weights are not comonotone with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
