Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective
Lukas Wutschitz, Boris K\"opf, Andrew Paverd, Saravan Rajmohan, Ahmed, Salem, Shruti Tople, Santiago Zanella-B\'eguelin, Menglin Xia, Victor R\"uhle

TL;DR
This paper proposes an information flow control approach to enhance privacy in machine learning systems by leveraging metadata like access policies, comparing fine-tuning and retrieval-based methods for user privacy guarantees.
Contribution
It introduces an information flow control framework for ML pipelines, enabling explicit privacy guarantees using metadata, and compares two user-level non-interference approaches.
Findings
Retrieval augmented models outperform fine-tuning in utility and scalability.
Metadata-based control provides clear privacy guarantees.
Retrieval models satisfy strict non-interference while maintaining high performance.
Abstract
Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover, these techniques have limitations in scenarios where sensitive information is shared across multiple participants and fine-grained access control is required. By ignoring metadata, we therefore miss an opportunity to better address security, privacy, and confidentiality challenges. In this paper, we take an information flow control perspective to describe machine learning systems, which allows us to leverage metadata such as access control policies and define clear-cut privacy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
