# On the Importance of Temporal Context in Proximity Kernels: A Vocal   Separation Case Study

**Authors:** Delia Fano Yela, Sebastian Ewert, Derry FitzGerald, Mark Sandler

arXiv: 1702.02130 · 2017-11-01

## TL;DR

This paper enhances vocal separation by incorporating temporal context into proximity kernels, improving the stability and accuracy of source separation in noisy environments.

## Contribution

It introduces a novel temporal context extension to proximity kernels in Kernel Additive Modelling, improving separation quality over previous single-frame methods.

## Key findings

- Significant improvement in vocal separation quality
- Temporal context stabilizes similarity search
- Enhanced robustness against noise

## Abstract

Musical source separation methods exploit source-specific spectral characteristics to facilitate the decomposition process. Kernel Additive Modelling (KAM) models a source applying robust statistics to time-frequency bins as specified by a source-specific kernel, a function defining similarity between bins. Kernels in existing approaches are typically defined using metrics between single time frames. In the presence of noise and other sound sources information from a single-frame, however, turns out to be unreliable and often incorrect frames are selected as similar. In this paper, we incorporate a temporal context into the kernel to provide additional information stabilizing the similarity search. Evaluated in the context of vocal separation, our simple extension led to a considerable improvement in separation quality compared to previous kernels.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.02130/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1702.02130/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1702.02130/full.md

---
Source: https://tomesphere.com/paper/1702.02130