Rethinking Privacy in Machine Learning Pipelines from an Information   Flow Control Perspective

Lukas Wutschitz; Boris K\"opf; Andrew Paverd; Saravan Rajmohan; Ahmed; Salem; Shruti Tople; Santiago Zanella-B\'eguelin; Menglin Xia; Victor R\"uhle

arXiv:2311.15792·cs.LG·November 28, 2023·1 cites

Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

Lukas Wutschitz, Boris K\"opf, Andrew Paverd, Saravan Rajmohan, Ahmed, Salem, Shruti Tople, Santiago Zanella-B\'eguelin, Menglin Xia, Victor R\"uhle

PDF

Open Access

TL;DR

This paper proposes an information flow control approach to enhance privacy in machine learning systems by leveraging metadata like access policies, comparing fine-tuning and retrieval-based methods for user privacy guarantees.

Contribution

It introduces an information flow control framework for ML pipelines, enabling explicit privacy guarantees using metadata, and compares two user-level non-interference approaches.

Findings

01

Retrieval augmented models outperform fine-tuning in utility and scalability.

02

Metadata-based control provides clear privacy guarantees.

03

Retrieval models satisfy strict non-interference while maintaining high performance.

Abstract

Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover, these techniques have limitations in scenarios where sensitive information is shared across multiple participants and fine-grained access control is required. By ignoring metadata, we therefore miss an opportunity to better address security, privacy, and confidentiality challenges. In this paper, we take an information flow control perspective to describe machine learning systems, which allows us to leverage metadata such as access control policies and define clear-cut privacy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques