Attributable-Watermarking of Speech Generative Models

Yongbaek Cho; Changhoon Kim; Yezhou Yang; Yi Ren

arXiv:2202.08900·cs.SD·March 16, 2022

Attributable-Watermarking of Speech Generative Models

Yongbaek Cho, Changhoon Kim, Yezhou Yang, Yi Ren

PDF

Open Access

TL;DR

This paper proposes a watermarking technique for speech generative models that enables source attribution with high accuracy, balancing robustness against attacks and maintaining speech quality.

Contribution

It introduces improved algorithms for embedding robust watermarks in speech models, enhancing attribution accuracy and resilience to removal attempts.

Findings

01

High attribution accuracy achieved in speech models

02

Robust watermarks withstand removal attacks

03

Trade-off identified between watermark strength and speech quality

Abstract

Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the trade off between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Speech Recognition and Synthesis