Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss
Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno

TL;DR
This paper introduces a token-level training loss for Transformer-Transducer based speaker change detection, significantly enhancing accuracy by focusing on speaker change errors during training.
Contribution
It proposes a novel token-based training strategy with a custom edit-distance algorithm to improve speaker change detection performance.
Findings
Significant performance improvements on real-world datasets.
Effective reduction in false accept and false reject rates.
Enhanced evaluation metrics aligned with commercial needs.
Abstract
In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. The conventional T-T based SCD model loss optimizes all output tokens equally. Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy. To mitigate this issue, we use a customized edit-distance algorithm to estimate the token-level SCD false accept (FA) and false reject (FR) rates during training and optimize model parameters to minimize a weighted combination of the FA and FR, focusing the model on accurately predicting speaker changes. We also propose a set of evaluation metrics that align better with commercial use cases. Experiments on a group of challenging real-world datasets show that the proposed training method can significantly improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
MethodsALIGN · Feedback Alignment
