Deep learning methods in speaker recognition: a review
D\'avid Sztah\'o, Gy\"orgy Szasz\'ak, Andr\'as Beke

TL;DR
This review discusses how deep learning has become the leading approach in speaker recognition, replacing traditional methods with techniques like x-vectors, driven by increasing data availability and advancements in machine learning.
Contribution
It provides a comprehensive overview of deep learning applications in speaker recognition, highlighting the shift from traditional methods to DL-based solutions.
Findings
Deep learning now dominates speaker recognition methods.
x-vectors are the standard baseline in recent research.
Deep learning's effectiveness grows with more data.
Abstract
This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in speaker recognition too. It seems that DL becomes the now state-of-the-art solution for both speaker verification and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
