Collaborative Watermarking for Adversarial Speech Synthesis

Lauri Juvela (Aalto University; Finland); Xin Wang (National; Institute of Informatics; Japan)

arXiv:2309.15224·eess.AS·January 3, 2024

Collaborative Watermarking for Adversarial Speech Synthesis

Lauri Juvela (Aalto University, Finland), Xin Wang (National, Institute of Informatics, Japan)

PDF

Open Access

TL;DR

This paper introduces a collaborative training approach for watermarking neural speech synthesis, enhancing detection accuracy while maintaining speech quality, and demonstrating robustness against noise and distortions.

Contribution

It proposes a novel collaborative training scheme for speech watermarking that improves detection and robustness without degrading perceptual quality.

Findings

01

Collaborative training improves detection performance over traditional methods.

02

The approach enhances robustness against noise and time-stretching.

03

Listening tests show minimal impact on speech quality.

Abstract

Advances in neural speech synthesis have brought us technology that is not only close to human naturalness, but is also capable of instant voice cloning with little data, and is highly accessible with pre-trained models available. Naturally, the potential flood of generated content raises the need for synthetic speech detection and watermarking. Recently, considerable research effort in synthetic speech detection has been related to the Automatic Speaker Verification and Spoofing Countermeasure Challenge (ASVspoof), which focuses on passive countermeasures. This paper takes a complementary view to generated speech detection: a synthesis system should make an active effort to watermark the generated speech in a way that aids detection by another machine, but remains transparent to a human listener. We propose a collaborative training scheme for synthetic speech watermarking and show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsHiFi-GAN