An enhanced Conv-TasNet model for speech separation using a speaker distance-based loss function
Jose A. Arango-S\'anchez, Juli\'an D. Arias-Londo\~no

TL;DR
This paper improves speech separation in Spanish by enhancing Conv-TasNet with a speaker distance-based loss, achieving better SI-SDR scores and analyzing real-time deployment challenges.
Contribution
It introduces a novel Conv-TasNet architecture incorporating speaker similarity in the loss function, tailored for Spanish speech separation.
Findings
Best SI-SDR of 10.6 dB with the enhanced model
Inverse relationship between speaker similarity and performance
Real-time deployment issues with speaker channel synchronization
Abstract
This work addresses the problem of speech separation in the Spanish Language using pre-trained deep learning models. As with many speech processing tasks, large databases in other languages different from English are scarce. Therefore this work explores different training strategies using the Conv-TasNet model as a benchmark. A scale-invariant signal distortion ratio (SI-SDR) metric value of 9.9 dB was achieved for the best training strategy. Then, experimentally, we identified an inverse relationship between the speakers' similarity and the model's performance, so an improved ConvTasNet architecture was proposed. The enhanced Conv-TasNet model uses pre-trained speech embeddings to add a between-speakers cosine similarity term in the cost function, yielding an SI-SDR of 10.6 dB. Lastly, final experiments regarding real-time deployment show some drawbacks in the speakers' channel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsConvolutional time-domain audio separation network
