Improved I-vector-based Speaker Recognition for Utterances with Speaker Generated Non-speech sounds
Sri Harsha Dumpala, Ashish Panda, Sunil Kumar Kopparapu

TL;DR
This paper investigates the robustness of i-vector-based speaker recognition systems when dealing with non-speech sounds like laughter and breath, and finds that including laughter in training improves recognition accuracy.
Contribution
It introduces an analysis of speaker variation due to non-speech sounds and demonstrates that training with laughter enhances system performance on such utterances.
Findings
Including laughter in training improves recognition accuracy on speech-laugh segments.
Speaker-specific information is somewhat preserved across speech and non-speech sounds.
Training on non-speech sounds provides complementary information for speaker recognition.
Abstract
Conversational speech not only contains several variants of neutral speech but is also prominently interlaced with several speaker generated non-speech sounds such as laughter and breath. A robust speaker recognition system should be capable of recognizing a speaker irrespective of these variations in his speech. An understanding of whether the speaker-specific information represented by these variations is similar or not helps build a good speaker recognition system. In this paper, speaker variations captured by neutral speech of a speaker is analyzed by considering speech-laugh (a variant of neutral speech) and laughter (non-speech) sounds of the speaker. We study an i-vector-based speaker recognition system trained only on neutral speech and evaluate its performance on speech-laugh and laughter. Further, we analyze the effect of including laughter sounds during training of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
