Delving into VoxCeleb: environment invariant speaker recognition
Joon Son Chung, Jaesung Huh, Seongkyu Mun

TL;DR
This paper introduces an environment adversarial training framework that leverages video data in VoxCeleb to learn speaker embeddings invariant to environmental conditions, improving generalization in speaker recognition tasks.
Contribution
It proposes a novel adversarial training method utilizing video information to enhance environment invariance in speaker embeddings, which was not explored before.
Findings
Significant performance improvements over baselines in speaker identification.
Enhanced generalization to unseen environmental conditions.
Effective use of video data for environment-invariant feature learning.
Abstract
Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets. There has been a plethora of work in search for more powerful architectures or loss functions suitable for the task, but these works do not consider what information is learnt by the models, apart from being able to predict the given labels. In this work, we introduce an environment adversarial training framework in which the network can effectively learn speaker-discriminative and environment-invariant embeddings without explicit domain shift during training. We achieve this by utilising the previously unused `video' information in the VoxCeleb dataset. The environment adversarial training allows the network to generalise better to unseen conditions. The method is evaluated on both speaker identification and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
