Membership Inference Attacks Against Self-supervised Speech Models
Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

TL;DR
This paper investigates privacy risks in self-supervised speech models, revealing their vulnerability to membership inference attacks that can leak sensitive training data information.
Contribution
It is the first to analyze privacy risks of SSL speech models using MIA, demonstrating their susceptibility to membership information leakage.
Findings
SSL speech models are vulnerable to MIA with high AUC scores.
Membership information can be inferred at both utterance and speaker levels.
Ablation studies identify factors influencing MIA success.
Abstract
Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Privacy-Preserving Technologies in Data
