SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network
Kexin Wang, Jiahong Zhang, Yong Ren, Man Yao, Di Shang, Bo Xu, Guoqi, Li

TL;DR
SpikeVoice demonstrates high-quality text-to-speech synthesis using spiking neural networks, effectively capturing long-term dependencies with novel attention mechanisms, achieving comparable results to traditional neural networks while significantly reducing energy consumption.
Contribution
This paper introduces SpikeVoice, the first TTS system based on SNNs, and proposes Spiking Temporal-Sequential Attention to address long-term dependency challenges.
Findings
Achieves TTS quality comparable to ANN-based systems.
Uses only 10.5% of the energy of traditional neural networks.
Effective across multiple languages and speaker scenarios.
Abstract
Brain-inspired Spiking Neural Network (SNN) has demonstrated its effectiveness and efficiency in vision, natural language, and speech understanding tasks, indicating their capacity to "see", "listen", and "read". In this paper, we design \textbf{SpikeVoice}, which performs high-quality Text-To-Speech (TTS) via SNN, to explore the potential of SNN to "speak". A major obstacle to using SNN for such generative tasks lies in the demand for models to grasp long-term dependencies. The serial nature of spiking neurons, however, leads to the invisibility of information at future spiking time steps, limiting SNN models to capture sequence dependencies solely within the same time step. We term this phenomenon "partial-time dependency". To address this issue, we introduce Spiking Temporal-Sequential Attention STSA in the SpikeVoice. To the best of our knowledge, SpikeVoice is the first TTS work in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Robotics and Automated Systems
MethodsSoftmax · Attention Is All You Need · Spiking Neural Networks
