Audio representations for deep learning in sound synthesis: A review

Anastasia Natsiou; Sean O'Leary

arXiv:2201.02490·cs.SD·January 10, 2022

Audio representations for deep learning in sound synthesis: A review

Anastasia Natsiou, Sean O'Leary

PDF

Open Access

TL;DR

This review paper discusses various audio representations used in deep learning-based sound synthesis, highlighting how different representations influence model architecture choices, training efficiency, and sound quality evaluation.

Contribution

It provides a comprehensive overview of audio representations and their impact on deep learning sound synthesis architectures and evaluation methods.

Findings

01

Different audio representations affect model complexity and training time.

02

Transformations like feature extraction improve efficiency and perceptual relevance.

03

Evaluation metrics vary depending on the audio representation used.

Abstract

The rise of deep learning algorithms has led many researchers to withdraw from using classic signal processing methods for sound generation. Deep learning models have achieved expressive voice synthesis, realistic sound textures, and musical notes from virtual instruments. However, the most suitable deep learning architecture is still under investigation. The choice of architecture is tightly coupled to the audio representations. A sound's original waveform can be too dense and rich for deep learning models to deal with efficiently - and complexity increases training time and computational cost. Also, it does not represent sound in the manner in which it is perceived. Therefore, in many cases, the raw audio has been transformed into a compressed and more meaningful form using upsampling, feature-extraction, or even by adopting a higher level illustration of the waveform. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing