Towards Vulnerability Analysis of Voice-Driven Interfaces and   Countermeasures for Replay

Khalid Mahmood Malik; Hafiz Malik; and Roland Baumann

arXiv:1904.06591·cs.CR·April 16, 2019

Towards Vulnerability Analysis of Voice-Driven Interfaces and Countermeasures for Replay

Khalid Mahmood Malik, Hafiz Malik, and Roland Baumann

PDF

Open Access

TL;DR

This paper investigates the vulnerability of voice-driven interfaces to replay attacks and introduces a non-learning-based detection method using higher-order spectral analysis to identify replayed audio on smart speakers.

Contribution

It presents a novel framework modeling replay attack distortion as higher-order nonlinearity and employs HOSA for effective replay attack detection without machine learning.

Findings

01

Successful detection of replay attacks on Google Home and Amazon Alexa

02

Replay attack recordings effectively injected via drop-in conferencing

03

Proposed method outperforms traditional detection approaches

Abstract

Fake audio detection is expected to become an important research area in the field of smart speakers such as Google Home, Amazon Echo and chatbots developed for these platforms. This paper presents replay attack vulnerability of voice-driven interfaces and proposes a countermeasure to detect replay attack on these platforms. This paper presents a novel framework to model replay attack distortion, and then use a non-learning-based method for replay attack detection on smart speakers. The reply attack distortion is modeled as a higher-order nonlinearity in the replay attack audio. Higher-order spectral analysis (HOSA) is used to capture characteristics distortions in the replay audio. Effectiveness of the proposed countermeasure scheme is evaluated on original speech as well as corresponding replayed recordings. The replay attack recordings are successfully injected into the Google Home…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing