Real-Time Neural Voice Camouflage
Mia Chiquier, Chengzhi Mao, Carl Vondrick

TL;DR
This paper introduces a real-time voice camouflage method using predictive adversarial attacks to disrupt speech recognition systems like DeepSpeech, effective over physical distances without hindering human conversations.
Contribution
It presents a novel real-time predictive attack technique that significantly increases speech recognition errors, outperforming existing methods in practical scenarios.
Findings
Achieves 3.9x higher word error rate against DeepSpeech
Demonstrates effectiveness over physical distances
Outperforms baseline adversarial attacks in real-time settings
Abstract
Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping. We propose a method to camouflage a person's voice over-the-air from these systems without inconveniencing the conversation between people in the room. Standard adversarial attacks are not effective in real-time streaming situations because the characteristics of the signal will have changed by the time the attack is executed. We introduce predictive attacks, which achieve real-time performance by forecasting the attack that will be the most effective in the future. Under real-time constraints, our method jams the established speech recognition system DeepSpeech 3.9x more than baselines as measured through word error rate, and 6.6x more as measured through character error rate. We furthermore demonstrate our approach is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and Audio Processing · Video Surveillance and Tracking Methods · Image and Signal Denoising Methods
