Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation   Systems

Daniel Platnick; Bishoy Abdelnour; Eamon Earl; Rahul Kumar; Zahra; Rezaei; Thomas Tsangaris; Faraj Lagum

arXiv:2407.13153·cs.CL·July 19, 2024

Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems

Daniel Platnick, Bishoy Abdelnour, Eamon Earl, Rahul Kumar, Zahra, Rezaei, Thomas Tsangaris, Faraj Lagum

PDF

Open Access

TL;DR

This paper introduces Preset-Voice Matching (PVM), a novel regulated speech-to-speech translation framework that enhances privacy compliance and reduces misuse risk by matching input voices to preset voices instead of cloning.

Contribution

PVM is the first regulated S2ST framework that matches input voices to preset voices, avoiding cloning and ensuring compliance with privacy regulations.

Findings

01

Improves multi-speaker S2ST runtime efficiency

02

Enhances naturalness of synthesized speech

03

Reduces risks of voice cloning misuse

Abstract

In recent years, there has been increased demand for speech-to-speech translation (S2ST) systems in industry settings. Although successfully commercialized, cloning-based S2ST systems expose their distributors to liabilities when misused by individuals and can infringe on personality rights when exploited by media organizations. This work proposes a regulated S2ST framework called Preset-Voice Matching (PVM). PVM removes cross-lingual voice cloning in S2ST by first matching the input voice to a similar prior consenting speaker voice in the target-language. With this separation, PVM avoids cloning the input speaker, ensuring PVM systems comply with regulations and reduce risk of misuse. Our results demonstrate PVM can significantly improve S2ST system run-time in multi-speaker settings and the naturalness of S2ST synthesized speech. To our knowledge, PVM is the first explicitly regulated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems