MTCRNN: A multi-scale RNN for directed audio texture synthesis

M. Huzaifah; L. Wyse

arXiv:2011.12596·cs.SD·November 26, 2020·6 cites

MTCRNN: A multi-scale RNN for directed audio texture synthesis

M. Huzaifah, L. Wyse

PDF

Open Access

TL;DR

This paper introduces MTCRNN, a multi-scale RNN model that captures complex audio textures across multiple timescales, enabling user-directed synthesis of environmental sounds like rain and wind.

Contribution

The paper presents a novel multi-scale RNN architecture with a conditioning strategy for improved audio texture synthesis, addressing limitations of traditional methods.

Findings

01

Effective modeling of diverse environmental sounds

02

Enhanced synthesis quality with user control

03

Good performance on multiple datasets

Abstract

Audio textures are a subset of environmental sounds, often defined as having stable statistical characteristics within an adequately large window of time but may be unstructured locally. They include common everyday sounds such as from rain, wind, and engines. Given that these complex sounds contain patterns on multiple timescales, they are a challenge to model with traditional methods. We introduce a novel modelling approach for textures, combining recurrent neural networks trained at different levels of abstraction with a conditioning strategy that allows for user-directed synthesis. We demonstrate the model's performance on a variety of datasets, examine its performance on various metrics, and discuss some potential applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing