An investigation of the reconstruction capacity of stacked convolutional   autoencoders for log-mel-spectrograms

Anastasia Natsiou; Luca Longo; Sean O'Leary

arXiv:2301.07665·cs.SD·January 19, 2023

An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

Anastasia Natsiou, Luca Longo, Sean O'Leary

PDF

TL;DR

This paper explores the use of stacked convolutional autoencoders for compressing and reconstructing monophonic harmonic sounds from log-mel-spectrograms, demonstrating effective unsupervised audio representation learning.

Contribution

It introduces a novel application of convolutional autoencoders for audio compression and proposes an evaluation metric based on frequency accuracy for harmonic sound reconstruction.

Findings

01

Autoencoders successfully reconstruct harmonic sounds from compressed representations

02

Hyper-parameter tuning improves reconstruction quality

03

Frequency accuracy correlates with perceived sound quality

Abstract

In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. These representations can be used to manipulate the timbre and influence the synthesis of creative instrumental notes. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument timbre compression. Unsupervised deep learning methods can achieve audio compression by training the network to learn a mapping from waveforms or spectrograms to low-dimensional representations. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch. Further exploration of hyper-parameters and regularization techniques is demonstrated to enhance the performance of the initial design. In an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.