Membership Inference Attacks Against Self-supervised Speech Models

Wei-Cheng Tseng; Wei-Tsung Kao; Hung-yi Lee

arXiv:2111.05113·cs.CR·August 16, 2022

Membership Inference Attacks Against Self-supervised Speech Models

Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper investigates privacy risks in self-supervised speech models, revealing their vulnerability to membership inference attacks that can leak sensitive training data information.

Contribution

It is the first to analyze privacy risks of SSL speech models using MIA, demonstrating their susceptibility to membership information leakage.

Findings

01

SSL speech models are vulnerable to MIA with high AUC scores.

02

Membership information can be inferred at both utterance and speaker levels.

03

Ablation studies identify factors influencing MIA success.

Abstract

Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raytzeng/s3m-membership-inference
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Privacy-Preserving Technologies in Data