On-Device Speaker Anonymization of Acoustic Embeddings for ASR based   onFlexible Location Gradient Reversal Layer

Md Asif Jalal; Pablo Peso Parada; Jisi Zhang; Karthikeyan; Saravanan; Mete Ozay; Myoungji Han; Jung In Lee; Seokyeong Jung

arXiv:2307.13343·eess.AS·July 26, 2023

On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan, Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung

PDF

Open Access

TL;DR

This paper introduces a privacy-preserving framework for on-device speaker anonymization in speech recognition, using gradient reversal layers to anonymize acoustic embeddings while maintaining high ASR accuracy.

Contribution

It proposes a novel on-device speaker anonymization method with flexible gradient reversal layers integrated into ASR models, enabling privacy without sacrificing recognition performance.

Findings

01

Reduces speaker recognition accuracy by 33%.

02

Achieves 6.2% relative WER reduction.

03

Enables on-device privacy-preserving speech recognition.

Abstract

Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition (ASR). The proposed framework attaches flexible gradient reversal based speaker adversarial layers to target layers within an ASR model, where speaker adversarial training anonymizes acoustic embeddings generated by the targeted layers to remove speaker identity. We propose on-device deployment by execution of initial layers of the ASR model, and transmitting anonymized embeddings to the cloud, where the rest of the model is executed while preserving privacy. Experimental results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing