VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using   Vector-Quantized Contrastive Predictive Coding

Javier Nistal; Cyran Aouameur; Stefan Lattner; and Ga\"el Richard

arXiv:2105.01531·cs.SD·August 2, 2021

VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding

Javier Nistal, Cyran Aouameur, Stefan Lattner, and Ga\"el Richard

PDF

Open Access 1 Repo

TL;DR

VQCPC-GAN introduces a novel adversarial framework that enables the generation of variable-length audio by leveraging vector-quantized contrastive predictive coding tokens as conditional inputs, maintaining temporal consistency.

Contribution

The paper proposes VQCPC-GAN, a new method for variable-length audio synthesis using VQCPC tokens, which is a novel approach in adversarial audio generation.

Findings

01

VQCPC-GAN achieves comparable performance to strong baselines in variable-length audio synthesis.

02

The model maintains temporal consistency across generated audio segments.

03

Experimental results demonstrate the effectiveness of using VQCPC tokens as conditional inputs.

Abstract

Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image data". However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variable-length audio by exploiting Vector-Quantized Contrastive Predictive Coding (VQCPC). A sequence of VQCPC tokens extracted from real audio data serves as conditional input to a GAN architecture, providing step-wise time-dependent features of the generated content. The input noise z (characteristic in adversarial architectures) remains fixed over time, ensuring temporal consistency of global features. We evaluate the proposed model by comparing a diverse set of metrics against various strong baselines. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SonyCSLParis/vqcpc-gan
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies

MethodsInfoNCE · Contrastive Predictive Coding