# Joining Sound Event Detection and Localization Through Spatial   Segregation

**Authors:** Ivo Trowitzsch, Christopher Schymura, Dorothea Kolossa, Klaus, Obermayer

arXiv: 1904.00055 · 2019-12-24

## TL;DR

This paper introduces a method that combines sound event detection and localization in a robotic system using spatial stream segregation, enabling robust auditory scene analysis even with multiple sources and localization errors.

## Contribution

It presents a novel joint approach that integrates localization and detection via probabilistic masks, improving auditory scene understanding in complex environments.

## Key findings

- Effective segregation of sound sources in simulated scenes
- High performance achieved through optimal head rotation
- Robust detection despite localization and source number errors

## Abstract

Identification and localization of sounds are both integral parts of computational auditory scene analysis. Although each can be solved separately, the goal of forming coherent auditory objects and achieving a comprehensive spatial scene understanding suggests pursuing a joint solution of the two problems. This work presents an approach that robustly binds localization with the detection of sound events in a binaural robotic system. Both tasks are joined through the use of spatial stream segregation which produces probabilistic time-frequency masks for individual sources attributable to separate locations, enabling segregated sound event detection operating on these streams. We use simulations of a comprehensive suite of test scenes with multiple co-occurring sound sources, and propose performance measures for systematic investigation of the impact of scene complexity on this segregated detection of sound types. Analyzing the effect of spatial scene arrangement, we show how a robot could facilitate high performance through optimal head rotation. Furthermore, we investigate the performance of segregated detection given possible localization error as well as error in the estimation of number of active sources. Our analysis demonstrates that the proposed approach is an effective method to obtain joint sound event location and type information under a wide range of conditions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.00055/full.md

## Figures

33 figures with captions in the complete paper: https://tomesphere.com/paper/1904.00055/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/1904.00055/full.md

---
Source: https://tomesphere.com/paper/1904.00055