One-Shot Speaker Identification for a Service Robot using a CNN-based Generic Verifier
Ivette V\'elez (1), Caleb Rascon (1), Gibr\'an Fuentes-Pineda (1) ((1), Instituto de Investigaciones en Matem\'aticas Aplicadas y en Sistemas, (IIMAS), Universidad Nacional Aut\'onoma de M\'exico (UNAM), Mexico.)

TL;DR
This paper introduces a CNN-based verification system enabling service robots to identify new users through one-shot learning without retraining, by verifying speech against an external database, suitable for real-world applications.
Contribution
A novel Siamese CNN architecture for speaker verification that allows one-shot identification without retraining, suitable for dynamic service environments.
Findings
High verification accuracy demonstrated in experiments
Effective in real-life noisy environments
Fast identification process suitable for real-time applications
Abstract
In service robotics, there is an interest to identify the user by voice alone. However, in application scenarios where a service robot acts as a waiter or a store clerk, new users are expected to enter the environment frequently. Typically, speaker identification models need to be retrained when this occurs, which can take an impractical amount of time. In this paper, a new approach for speaker identification through verification has been developed using a Siamese Convolutional Neural Network architecture (SCNN), where it learns to generically verify if two audio signals are from the same speaker. By having an external database of recorded audio of the users, identification is carried out by verifying the speech input with each of its entries. If new users are encountered, it is only required to add their recorded audio to the external database to be able to be identified, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Infant Health and Development
