# Spectrogram Feature Losses for Music Source Separation

**Authors:** Abhimanyu Sahai, Romann Weber, Brian McWilliams

arXiv: 1901.05061 · 2019-06-28

## TL;DR

This paper introduces a high-level feature loss based on VGG spectrogram features to enhance music source separation quality, demonstrating improvements over traditional pixel-level losses in deep learning models.

## Contribution

It demonstrates that incorporating a VGG-based feature loss into training improves music source separation performance over standard pixel-level loss.

## Key findings

- Improved separation quality with feature loss
- Effective for drums and vocals extraction
- Applicable across diverse music genres

## Abstract

In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.05061/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1901.05061/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/1901.05061/full.md

---
Source: https://tomesphere.com/paper/1901.05061