On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan, Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung

TL;DR
This paper introduces a privacy-preserving framework for on-device speaker anonymization in speech recognition, using gradient reversal layers to anonymize acoustic embeddings while maintaining high ASR accuracy.
Contribution
It proposes a novel on-device speaker anonymization method with flexible gradient reversal layers integrated into ASR models, enabling privacy without sacrificing recognition performance.
Findings
Reduces speaker recognition accuracy by 33%.
Achieves 6.2% relative WER reduction.
Enables on-device privacy-preserving speech recognition.
Abstract
Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition (ASR). The proposed framework attaches flexible gradient reversal based speaker adversarial layers to target layers within an ASR model, where speaker adversarial training anonymizes acoustic embeddings generated by the targeted layers to remove speaker identity. We propose on-device deployment by execution of initial layers of the ASR model, and transmitting anonymized embeddings to the cloud, where the rest of the model is executed while preserving privacy. Experimental results show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
