Encrypted Speech Recognition using Deep Polynomial Networks
Shi-Xiong Zhang, Yifan Gong, Dong Yu

TL;DR
This paper introduces a deep polynomial network (DPN) that enables encrypted speech recognition, allowing privacy-preserving cloud-based speech processing with minimal performance loss and increased security.
Contribution
It presents a novel encrypted speech recognition model using DPNs and a joint decoding framework, enhancing privacy without significant accuracy or latency trade-offs.
Findings
Effective encrypted speech recognition with small accuracy degradation
Supports training on unencrypted data in traditional manner
Achieves practical privacy-preserving speech recognition on real datasets
Abstract
The cloud-based speech recognition/API provides developers or enterprises an easy way to create speech-enabled features in their applications. However, sending audios about personal or company internal information to the cloud, raises concerns about the privacy and security issues. The recognition results generated in cloud may also reveal some sensitive information. This paper proposes a deep polynomial network (DPN) that can be applied to the encrypted speech as an acoustic model. It allows clients to send their data in an encrypted form to the cloud to ensure that their data remains confidential, at mean while the DPN can still make frame-level predictions over the encrypted speech and return them in encrypted form. One good property of the DPN is that it can be trained on unencrypted speech features in the traditional way. To keep the cloud away from the raw audio and recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Chaos-based Image/Signal Encryption
