FakeWake: Understanding and Mitigating Fake Wake-up Words of Voice Assistants
Yanjiao Chen, Yijie Bai, Richard Mitev, Kaibo Wang, Ahmad-Reza Sadeghi, and Wenyuan Xu

TL;DR
This paper investigates the FakeWake phenomenon in voice assistants, introduces a fuzzy word generator, analyzes phonetic causes with a decision model, and proposes mitigation strategies to improve robustness.
Contribution
It presents the first automated fuzzy word generator, an interpretable phonetic analysis model, and effective mitigation methods for FakeWake vulnerabilities in voice assistants.
Findings
Generated 965 fuzzy words covering popular smart speakers.
Identified phonetic features contributing to false wake-up triggers.
Strengthened models resist fuzzy words and perform better overall.
Abstract
In the area of Internet of Things (IoT) voice assistants have become an important interface to operate smart speakers, smartphones, and even automobiles. To save power and protect user privacy, voice assistants send commands to the cloud only if a small set of pre-registered wake-up words are detected. However, voice assistants are shown to be vulnerable to the FakeWake phenomena, whereby they are inadvertently triggered by innocent-sounding fuzzy words. In this paper, we present a systematic investigation of the FakeWake phenomena from three aspects. To start with, we design the first fuzzy word generator to automatically and efficiently produce fuzzy words instead of searching through a swarm of audio materials. We manage to generate 965 fuzzy words covering 8 most popular English and Chinese smart speakers. To explain the causes underlying the FakeWake phenomena, we construct an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Speech Recognition and Synthesis
