TL;DR
Deep classifiers can secretly encode sensitive attributes into their outputs, posing privacy risks even when internal representations are hidden, which can be exploited by malicious service providers.
Contribution
This paper introduces an information-theoretical framework and empirical methods for training classifiers that are both accurate and secretly encode sensitive attributes.
Findings
HBC classifiers can accurately predict target and sensitive attributes simultaneously.
The attack works even with full access to the classifier's outputs and hidden internal states.
Detecting HBC classifiers is challenging, raising privacy concerns.
Abstract
It is known that deep neural networks, trained for the classification of non-sensitive target attributes, can reveal sensitive attributes of their input data through internal representations extracted by the classifier. We take a step forward and show that deep classifiers can be trained to secretly encode a sensitive attribute of their input data into the classifier's outputs for the target attribute, at inference time. Our proposed attack works even if users have a full white-box view of the classifier, can keep all internal representations hidden, and only release the classifier's estimations for the target attribute. We introduce an information-theoretical formulation for such attacks and present efficient empirical implementations for training honest-but-curious (HBC) classifiers: classifiers that can be accurate in predicting their target attribute, but can also exploit their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
