Encrypted Speech Recognition using Deep Polynomial Networks

Shi-Xiong Zhang; Yifan Gong; Dong Yu

arXiv:1905.05605·cs.CR·May 15, 2019·1 cites

Encrypted Speech Recognition using Deep Polynomial Networks

Shi-Xiong Zhang, Yifan Gong, Dong Yu

PDF

Open Access

TL;DR

This paper introduces a deep polynomial network (DPN) that enables encrypted speech recognition, allowing privacy-preserving cloud-based speech processing with minimal performance loss and increased security.

Contribution

It presents a novel encrypted speech recognition model using DPNs and a joint decoding framework, enhancing privacy without significant accuracy or latency trade-offs.

Findings

01

Effective encrypted speech recognition with small accuracy degradation

02

Supports training on unencrypted data in traditional manner

03

Achieves practical privacy-preserving speech recognition on real datasets

Abstract

The cloud-based speech recognition/API provides developers or enterprises an easy way to create speech-enabled features in their applications. However, sending audios about personal or company internal information to the cloud, raises concerns about the privacy and security issues. The recognition results generated in cloud may also reveal some sensitive information. This paper proposes a deep polynomial network (DPN) that can be applied to the encrypted speech as an acoustic model. It allows clients to send their data in an encrypted form to the cloud to ensure that their data remains confidential, at mean while the DPN can still make frame-level predictions over the encrypted speech and return them in encrypted form. One good property of the DPN is that it can be trained on unencrypted speech features in the traditional way. To keep the cloud away from the raw audio and recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Chaos-based Image/Signal Encryption