Improved Large-margin Softmax Loss for Speaker Diarisation
Yassir Fathullah, Chao Zhang, Philip C. Woodland

TL;DR
This paper introduces an improved large-margin softmax loss for speaker diarisation, enhancing embedding discriminability and stability, leading to significant reductions in speaker error rates on the AMI corpus.
Contribution
It proposes a novel large-margin softmax loss without approximations, a training stabilization method, and a margin strategy for overlapping speech, advancing diarisation accuracy.
Findings
Achieved a 24.6% relative SER reduction over baseline.
Further improved to 29.5% SER reduction using margin strategies.
Demonstrated effectiveness on the AMI meeting corpus.
Abstract
Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers. It is well-known that large-margin training can improve the generalisation ability to unseen data, and its use in such open-set problems has been widespread. Therefore, this paper introduces a general approach to the large-margin softmax loss without any approximations to improve the quality of speaker embeddings for diarisation. Furthermore, a novel and simple way to stabilise training, when large-margin softmax is used, is proposed. Finally, to combat the effect of overlapping speech, different training margins are used to reduce the negative effect overlapping speech has on creating discriminative embeddings. Experiments on the AMI meeting corpus show that the use of large-margin softmax significantly improves the speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
