TL;DR
This paper introduces SALSA-Lite, a lightweight and fast feature for polyphonic sound event localization and detection that significantly improves computational efficiency while maintaining competitive accuracy.
Contribution
SALSA-Lite is a novel, computationally efficient feature for polyphonic SELD that outperforms traditional features and is 30 times faster than the original SALSA feature.
Findings
SALSA-Lite achieves 15% higher localization F1 score.
SALSA-Lite outperforms traditional features in localization recall.
The method is 30 times faster than the original SALSA feature.
Abstract
Polyphonic sound event localization and detection (SELD) has many practical applications in acoustic sensing and monitoring. However, the development of real-time SELD has been limited by the demanding computational requirement of most recent SELD systems. In this work, we introduce SALSA-Lite, a fast and effective feature for polyphonic SELD using microphone array inputs. SALSA-Lite is a lightweight variation of a previously proposed SALSA feature for polyphonic SELD. SALSA, which stands for Spatial Cue-Augmented Log-Spectrogram, consists of multichannel log-spectrograms stacked channelwise with the normalized principal eigenvectors of the spectrotemporally corresponding spatial covariance matrices. In contrast to SALSA, which uses eigenvector-based spatial features, SALSA-Lite uses normalized inter-channel phase differences as spatial features, allowing a 30-fold speedup compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
