TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
Xiaoxue Gao, Yiming Chen, Xianghu Yue, Yu Tsao, Nancy F. Chen

TL;DR
This paper introduces TTSlow, an adversarial method to intentionally slow down TTS systems, evaluating their robustness and efficiency against input perturbations, and highlighting vulnerabilities across different models and datasets.
Contribution
The paper presents TTSlow, the first attack method targeting TTS models to evaluate their robustness and efficiency, using novel adversarial strategies on text and speaker embeddings.
Findings
TTSlow effectively increases TTS generation time across multiple models and datasets.
The attack impacts speech intelligibility minimally while significantly slowing down synthesis.
The approach reveals vulnerabilities in both autoregressive and non-autoregressive TTS systems.
Abstract
Text-to-speech (TTS) has been extensively studied for generating high-quality speech with textual inputs, playing a crucial role in various real-time applications. For real-world deployment, ensuring stable and timely generation in TTS models against minor input perturbations is of paramount importance. Therefore, evaluating the robustness of TTS models against such perturbations, commonly known as adversarial attacks, is highly desirable. In this paper, we propose TTSlow, a novel adversarial approach specifically tailored to slow down the speech generation process in TTS systems. To induce long TTS waiting time, we design novel efficiency-oriented adversarial loss to encourage endless generation process. TTSlow encompasses two attack strategies targeting both text inputs and speaker embedding. Specifically, we propose TTSlow-text, which utilizes a combination of homoglyphs-based and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
