HI-MIA : A Far-field Text-Dependent Speaker Verification Database and   the Baselines

Xiaoyi Qin; Hui Bu; Ming Li

arXiv:1912.01231·cs.SD·February 4, 2020·1 cites

HI-MIA : A Far-field Text-Dependent Speaker Verification Database and the Baselines

Xiaoyi Qin, Hui Bu, Ming Li

PDF

Open Access

TL;DR

This paper introduces HI-MIA, a new far-field speaker verification database with multiple microphone arrays, and establishes baseline neural network systems demonstrating promising performance in realistic far-field scenarios.

Contribution

The paper provides a novel far-field speaker verification database and baseline neural network systems, including a background-aware enrollment augmentation strategy, addressing gaps in existing datasets.

Findings

01

Achieved 3.29% EER with fusion systems in far-field scenarios.

02

Achieved 4.02% EER in close-talking enrollment with far-field testing.

03

Presented a new dataset suitable for real-world far-field speaker verification.

Abstract

This paper presents a far-field text-dependent speaker verification database named HI-MIA. We aim to meet the data requirement for far-field microphone array based speaker verification since most of the publicly available databases are single channel close-talking and text-independent. The database contains recordings of 340 people in rooms designed for the far-field scenario. Recordings are captured by multiple microphone arrays located in different directions and distance to the speaker and a high-fidelity close-talking microphone. Besides, we propose a set of end-to-end neural network based baseline systems that adopt single-channel data for training. Moreover, we propose a testing background aware enrollment augmentation strategy to further enhance the performance. Results show that the fusion systems could achieve 3.29% EER in the far-field enrollment far field testing task and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing