HI-MIA : A Far-field Text-Dependent Speaker Verification Database and the Baselines
Xiaoyi Qin, Hui Bu, Ming Li

TL;DR
This paper introduces HI-MIA, a new far-field speaker verification database with multiple microphone arrays, and establishes baseline neural network systems demonstrating promising performance in realistic far-field scenarios.
Contribution
The paper provides a novel far-field speaker verification database and baseline neural network systems, including a background-aware enrollment augmentation strategy, addressing gaps in existing datasets.
Findings
Achieved 3.29% EER with fusion systems in far-field scenarios.
Achieved 4.02% EER in close-talking enrollment with far-field testing.
Presented a new dataset suitable for real-world far-field speaker verification.
Abstract
This paper presents a far-field text-dependent speaker verification database named HI-MIA. We aim to meet the data requirement for far-field microphone array based speaker verification since most of the publicly available databases are single channel close-talking and text-independent. The database contains recordings of 340 people in rooms designed for the far-field scenario. Recordings are captured by multiple microphone arrays located in different directions and distance to the speaker and a high-fidelity close-talking microphone. Besides, we propose a set of end-to-end neural network based baseline systems that adopt single-channel data for training. Moreover, we propose a testing background aware enrollment augmentation strategy to further enhance the performance. Results show that the fusion systems could achieve 3.29% EER in the far-field enrollment far field testing task and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
