Differentially Private Active Learning: Balancing Effective Data Selection and Privacy
Kristian Schwethelm, Johannes Kaiser, Jonas Kuntzer, Mehmet Yigitsoy,, Daniel Rueckert, Georgios Kaissis

TL;DR
This paper introduces a differentially private active learning framework that balances effective data selection with privacy preservation, addressing key challenges in privacy budget management and data utilization.
Contribution
It proposes step amplification to improve data utilization in DP-AL and evaluates acquisition functions under privacy constraints, advancing privacy-preserving active learning methods.
Findings
DP-AL can improve performance on certain datasets.
Naive DP-SGD integration faces privacy budget challenges.
Limitations exist in AL's effectiveness under strict privacy constraints.
Abstract
Active learning (AL) is a widely used technique for optimizing data labeling in machine learning by iteratively selecting, labeling, and training on the most informative data. However, its integration with formal privacy-preserving methods, particularly differential privacy (DP), remains largely underexplored. While some works have explored differentially private AL for specialized scenarios like online learning, the fundamental challenge of combining AL with DP in standard learning settings has remained unaddressed, severely limiting AL's applicability in privacy-sensitive domains. This work addresses this gap by introducing differentially private active learning (DP-AL) for standard learning settings. We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization. To overcome these challenges, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
