Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings
D\'avid Sztah\'o, Attila Fejes

TL;DR
This study investigates the impact of language mismatch on deep learning-based forensic voice comparison, showing that models trained on large multilingual datasets perform well even on low-resource languages like Hungarian.
Contribution
It demonstrates that pre-trained speaker embedding models on large multilingual datasets can effectively be used for forensic voice comparison in low-resource languages, despite language differences.
Findings
Models trained on large multilingual datasets perform well across languages.
Sample duration positively influences verification performance.
Speaking style variations have minimal impact on results.
Abstract
In forensic voice comparison the speaker embedding has become widely popular in the last 10 years. Most of the pretrained speaker embeddings are trained on English corpora, because it is easily accessible. Thus, language dependency can be an important factor in automatic forensic voice comparison, especially when the target language is linguistically very different. There are numerous commercial systems available, but their models are mainly trained on a different language (mostly English) than the target language. In the case of a low-resource language, developing a corpus for forensic purposes containing enough speakers to train deep learning models is costly. This study aims to investigate whether a model pre-trained on English corpus can be used on a target low-resource language (here, Hungarian), different from the model is trained on. Also, often multiple samples are not available…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
