Wavebender GAN: An architecture for phonetically meaningful speech manipulation
Gustavo Teodoro D\"ohler Beck, Ulme Wennberg, Zofia Malisz, Gustav Eje, Henter

TL;DR
Wavebender GAN introduces a neural architecture that enables precise, controllable manipulation of speech features with high perceptual quality, addressing limitations of traditional signal-processing methods in speech science applications.
Contribution
The paper presents a novel neural network architecture that learns to control speech properties directly, improving flexibility and quality over legacy signal-processing techniques.
Findings
Effective control of pitch, formants, and voice quality in speech synthesis.
High perceptual quality of manipulated speech stimuli.
Potential for improved speech perception experiments.
Abstract
Deep learning has revolutionised synthetic speech quality. However, it has thus far delivered little value to the speech science community. The new methods do not meet the controllability demands that practitioners in this area require e.g.: in listening tests with manipulated speech stimuli. Instead, control of different speech properties in such stimuli is achieved by using legacy signal-processing methods. This limits the range, accuracy, and speech quality of the manipulations. Also, audible artefacts have a negative impact on the methodological validity of results in speech perception studies. This work introduces a system capable of manipulating speech properties through learning rather than design. The architecture learns to control arbitrary speech properties and leverages progress in neural vocoders to obtain realistic output. Experiments with copy synthesis and manipulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
