Upsampling layers for music source separation

Jordi Pons; Joan Serr\`a; Santiago Pascual; Giulio Cengarle; Daniel; Arteaga; Davide Scaini

arXiv:2111.11773·cs.SD·November 24, 2021·1 cites

Upsampling layers for music source separation

Jordi Pons, Joan Serr\`a, Santiago Pascual, Giulio Cengarle, Daniel, Arteaga, Davide Scaini

PDF

Open Access

TL;DR

This paper investigates the impact of various upsampling layers and artifacts on music source separation, benchmarking multiple methods including novel layers, and finds that perceptually preferable filtering artifacts often lead to worse objective scores.

Contribution

It introduces and benchmarks new upsampling layers, including novel stretch, sinc, and learnable wavelet methods, analyzing their artifacts and effects on audio quality and model performance.

Findings

01

Filtering artifacts are perceptually preferable despite lower objective scores.

02

Different upsampling methods produce distinct spectral and tonal artifacts.

03

Novel wavelet-based upsampling layers are proposed and evaluated.

Abstract

Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling. Also, depending on the used upsampling layer, such artifacts can either be tonal artifacts (additive high-frequency noise) or filtering artifacts (substractive, attenuating some bands). In this work we investigate the practical implications of having upsampling artifacts in the resulting audio, by studying how different artifacts interact and assessing their impact on the models' performance. To that end, we benchmark a large set of upsampling layers for music source separation: different transposed and subpixel convolution setups, different interpolation upsamplers (including two novel layers based on stretch and sinc interpolation), and different wavelet-based upsamplers (including a novel learnable wavelet layer). Our results show that filtering artifacts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research

MethodsConvolution