Towards Lightweight Controllable Audio Synthesis with Conditional   Implicit Neural Representations

Jan Zuiderveld; Marco Federici; Erik J. Bekkers

arXiv:2111.08462·cs.SD·December 3, 2021

Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

Jan Zuiderveld, Marco Federici, Erik J. Bekkers

PDF

Open Access

TL;DR

This paper explores the use of Conditional Implicit Neural Representations (CINRs) for lightweight, controllable audio synthesis, demonstrating faster learning and better reconstructions than traditional CNNs, with insights into their hyperparameter sensitivity.

Contribution

It introduces PCINRs as efficient backbones for audio synthesis and analyzes their performance, hyperparameter sensitivity, and noise characteristics, providing guidance for future improvements.

Findings

01

PCINRs learn faster and produce better audio reconstructions than CNNs with similar parameters.

02

Performance of PCINRs is highly sensitive to activation scaling hyperparameters.

03

Regularization and reduced depth mitigate high-frequency noise in PCINR reconstructions.

Abstract

The high temporal resolution of audio and our perceptual sensitivity to small irregularities in waveforms make synthesizing at high sampling rates a complex and computationally intensive task, prohibiting real-time, controllable synthesis within many approaches. In this work we aim to shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in generative frameworks for audio synthesis. Our experiments show that small Periodic Conditional INRs (PCINRs) learn faster and generally produce quantitatively better audio reconstructions than Transposed Convolutional Neural Networks with equal parameter counts. However, their performance is very sensitive to activation scaling hyperparameters. When learning to represent more uniform sets, PCINRs tend to introduce artificial high-frequency components in reconstructions. We validate this noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing