Multichannel-based learning for audio object extraction

Daniel Arteaga; Jordi Pons

arXiv:2102.06142·cs.SD·December 22, 2021

Multichannel-based learning for audio object extraction

Daniel Arteaga, Jordi Pons

PDF

TL;DR

This paper introduces a deep learning method for extracting audio objects from multichannel recordings, addressing scalability issues in complex audio productions and allowing supervised or unsupervised learning approaches.

Contribution

It presents a novel deep learning framework that learns from multichannel renders, enabling scalable audio object extraction and defining new evaluation standards.

Findings

01

The method effectively handles dozens of simultaneous audio objects.

02

It outperforms baseline methods under certain conditions.

03

The approach supports both supervised and unsupervised learning modes.

Abstract

The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata. While rendering an object-based production into a multichannel mix is straightforward, the reverse process involves sound source separation and estimating the spatial trajectories of the extracted sources. Besides, cinematic object-based productions are often composed by dozens of simultaneous audio objects, which poses a scalability challenge for audio object extraction. Here, we propose a novel deep learning approach to object extraction that learns from the multichannel renders of object-based productions, instead of directly learning from the audio objects themselves. This approach allows tackling the object scalability challenge and also offers the possibility to formulate the problem in a supervised or an unsupervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.