Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes
Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi,, Yong-Hwa Park

TL;DR
This paper enhances sound event localization and detection in real scenes by using multi-dimensional data augmentation, including a novel Moderate Mixup, and applying Squeeze-and-Excitation blocks to improve feature extraction, achieving state-of-the-art results.
Contribution
It introduces a new data augmentation method called Moderate Mixup and applies Squeeze-and-Excitation blocks across multiple dimensions for improved SELD performance.
Findings
Achieved best ER of 0.53 on STARSS22 dataset.
Improved F1 score to 49.8%.
Enhanced localization accuracy with 16.0 degrees error.
Abstract
Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions: channel, frequency and time. We also propose original data augmentation method named Moderate Mixup in order to simulate situations where noise floor or interfering events exist. Second, we applied Squeeze-and-Excitation block on channel and frequency dimensions to efficiently extract feature characteristics. Result of our trained models on the STARSS22 test dataset achieved the best ER, F1, LE, and LR of 0.53, 49.8%, 16.0deg., and 56.2% respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsTest · *Communicated@Fast*How Do I Communicate to Expedia? · Sigmoid Activation · Average Pooling · Dense Connections · Convolution · Mixup · Squeeze-and-Excitation Block
