oboVox Far Field Speaker Recognition: A Novel Data Augmentation Approach with Pretrained Models
Muhammad Sudipto Siam Dip, Md Anik Hasan, Sapnil Sarker Bipro, Md, Abdur Raiyan, Mohammod Abdul Motin

TL;DR
This paper introduces a novel noise-based data augmentation method for far-field speaker recognition, leveraging pre-trained models like ResNet to improve accuracy and robustness in challenging acoustic environments.
Contribution
The study proposes a new noise augmentation technique that aligns test and enrollment data sources, demonstrating its effectiveness with pre-trained models, especially ResNet, in far-field speaker recognition.
Findings
ResNet achieved a DCF of 0.84 and EER of 13.44 before augmentation.
Augmentation improved ResNet performance to 0.75 DCF and 12.79 EER.
ResNet outperformed ECPA, Mel-spectrogram, Payonnet, and Titanet large models.
Abstract
In this study, we address the challenge of speaker recognition using a novel data augmentation technique of adding noise to enrollment files. This technique efficiently aligns the sources of test and enrollment files, enhancing comparability. Various pre-trained models were employed, with the resnet model achieving the highest DCF of 0.84 and an EER of 13.44. The augmentation technique notably improved these results to 0.75 DCF and 12.79 EER for the resnet model. Comparative analysis revealed the superiority of resnet over models such as ECPA, Mel-spectrogram, Payonnet, and Titanet large. Results, along with different augmentation schemes, contribute to the success of RoboVox far-field speaker recognition in this paper
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsKaiming Initialization · Max Pooling · Convolution · Average Pooling · Global Average Pooling
