Team HYU ASML ROBOVOX SP Cup 2024 System Description
Jeong-Hwan Choi, Gaeun Kim, Hee-Jae Lee, Seyun Ahn and, Hyun-Soo Kim, Joon-Hyuk Chang

TL;DR
This paper presents HYU ASML's system for the IEEE Signal Processing Cup 2024 challenge, focusing on far-field speaker recognition in noisy environments using deep neural networks, data augmentation, and diverse training data.
Contribution
The team developed a robust speaker recognition system combining residual and time-delay neural networks trained on diverse data, achieving high accuracy in challenging conditions.
Findings
Achieved second place in SP Cup 2024 leaderboard.
Reduced detection cost function to 0.5245.
Attained an EER of 6.46% in noisy, reverberant environments.
Abstract
This report describes the submission of HYU ASML team to the IEEE Signal Processing Cup 2024 (SP Cup 2024). This challenge, titled "ROBOVOX: Far-Field Speaker Recognition by a Mobile Robot," focuses on speaker recognition using a mobile robot in noisy and reverberant conditions. Our solution combines the result of deep residual neural networks and time-delay neural network-based speaker embedding models. These models were trained on a diverse dataset that includes French speech. To account for the challenging evaluation environment characterized by high noise, reverberation, and short speech conditions, we focused on data augmentation and training speech duration for the speaker embedding model. Our submission achieved second place on the SP Cup 2024 public leaderboard, with a detection cost function of 0.5245 and an equal error rate of 6.46%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Technology and Control Systems · Real-time simulation and control systems
