Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems
Daniel Platnick, Bishoy Abdelnour, Eamon Earl, Rahul Kumar, Zahra, Rezaei, Thomas Tsangaris, Faraj Lagum

TL;DR
This paper introduces Preset-Voice Matching (PVM), a novel regulated speech-to-speech translation framework that enhances privacy compliance and reduces misuse risk by matching input voices to preset voices instead of cloning.
Contribution
PVM is the first regulated S2ST framework that matches input voices to preset voices, avoiding cloning and ensuring compliance with privacy regulations.
Findings
Improves multi-speaker S2ST runtime efficiency
Enhances naturalness of synthesized speech
Reduces risks of voice cloning misuse
Abstract
In recent years, there has been increased demand for speech-to-speech translation (S2ST) systems in industry settings. Although successfully commercialized, cloning-based S2ST systems expose their distributors to liabilities when misused by individuals and can infringe on personality rights when exploited by media organizations. This work proposes a regulated S2ST framework called Preset-Voice Matching (PVM). PVM removes cross-lingual voice cloning in S2ST by first matching the input voice to a similar prior consenting speaker voice in the target-language. With this separation, PVM avoids cloning the input speaker, ensuring PVM systems comply with regulations and reduce risk of misuse. Our results demonstrate PVM can significantly improve S2ST system run-time in multi-speaker settings and the naturalness of S2ST synthesized speech. To our knowledge, PVM is the first explicitly regulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
