A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation
Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J., Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, and William, Wolcott

TL;DR
This paper introduces a generalized neural network model for cinematic audio source separation that leverages psychoacoustic frequency scales, a novel loss function, and a flexible architecture to improve separation quality and computational efficiency.
Contribution
The work extends the Bandsplit RNN to handle various frequency partitions using psychoacoustic scales, introduces a new loss function, and employs a common-encoder setup for better performance and flexibility.
Findings
Achieved state-of-the-art results on the Divide and Remaster dataset.
Performed above the ideal ratio mask for dialogue separation.
Reduced computational complexity with a shared encoder architecture.
Abstract
Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with detachable decoders. Our best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing
