Risk of re-identification for shared clinical speech recordings

Daniela A. Wiepert; Bradley A. Malin; Joseph R. Duffy; Rene L.; Utianski; John L. Stricker; David T. Jones; and Hugo Botha

arXiv:2210.09975·eess.AS·August 23, 2023

Risk of re-identification for shared clinical speech recordings

Daniela A. Wiepert, Bradley A. Malin, Joseph R. Duffy, Rene L., Utianski, John L. Stricker, David T. Jones, and Hugo Botha

PDF

Open Access 1 Repo

TL;DR

This study assesses the privacy risk of re-identifying individuals from shared clinical speech recordings using speaker recognition systems, finding that risk decreases with larger search spaces and varies with speech type.

Contribution

It provides an empirical analysis of re-identification risk in speech data, highlighting factors that influence privacy vulnerability in healthcare datasets.

Findings

01

Re-identification risk decreases as the search space increases.

02

Non-connected speech recordings are harder to re-identify.

03

Overall re-identification risk in practice appears low.

Abstract

Large, curated datasets are required to leverage speech-based tools in healthcare. These are costly to produce, resulting in increased interest in data sharing. As speech can potentially identify speakers (i.e., voiceprints), sharing recordings raises privacy concerns. We examine the re-identification risk for speech recordings, without reference to demographic or metadata, using a state-of-the-art speaker recognition system. We demonstrate that the risk is inversely related to the number of comparisons an adversary must consider, i.e., the search space. Risk is high for a small search space but drops as the search space grows ( $p r ec i s i o n > 0.85$ for $< 1 * 1 0^{6}$ comparisons, $p r ec i s i o n < 0.5$ for $> 3 * 1 0^{6}$ comparisons). Next, we show that the nature of a speech recording influences re-identification risk, with non-connected speech (e.g., vowel prolongation) being harder to identify. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neurology-ai-program/speech_risk
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis