Co(ve)rtex: ML Models as storage channels and their (mis-)applications

Md Abdullah Al Mamun; Quazi Mishkatul Alam; Erfan Shayegani; Pedram; Zaree; Ihsen Alouani; Nael Abu-Ghazaleh

arXiv:2307.08811·cs.LG·May 14, 2024

Co(ve)rtex: ML Models as storage channels and their (mis-)applications

Md Abdullah Al Mamun, Quazi Mishkatul Alam, Erfan Shayegani, Pedram, Zaree, Ihsen Alouani, Nael Abu-Ghazaleh

PDF

Open Access

TL;DR

This paper models overparameterized ML models as storage channels, analyzing their capacity for embedding and retrieving information, and explores vulnerabilities and defenses related to data hiding and covert storage within models.

Contribution

It introduces an information theoretic framework for ML models as storage channels, deriving capacity bounds and proposing methods for covert data embedding and retrieval.

Findings

01

Derived an upper bound on storage capacity based on unused parameters.

02

Developed methods for covert data embedding and extraction in ML models.

03

Proposed optimization techniques to enhance capacity while maintaining task performance.

Abstract

Machine learning (ML) models are overparameterized to support generality and avoid overfitting. The state of these parameters is essentially a "don't-care" with respect to the primary model provided that this state does not interfere with the primary model. In both hardware and software systems, don't-care states and undefined behavior have been shown to be sources of significant vulnerabilities. In this paper, we propose a new information theoretic perspective of the problem; we consider the ML model as a storage channel with a capacity that increases with overparameterization. Specifically, we consider a sender that embeds arbitrary information in the model at training time, which can be extracted by a receiver with a black-box access to the deployed model. We derive an upper bound on the capacity of the channel based on the number of available unused parameters. We then explore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting