One-Shot Speaker Identification for a Service Robot using a CNN-based   Generic Verifier

Ivette V\'elez (1); Caleb Rascon (1); Gibr\'an Fuentes-Pineda (1) ((1); Instituto de Investigaciones en Matem\'aticas Aplicadas y en Sistemas; (IIMAS); Universidad Nacional Aut\'onoma de M\'exico (UNAM); Mexico.)

arXiv:1809.04115·eess.AS·September 13, 2018·5 cites

One-Shot Speaker Identification for a Service Robot using a CNN-based Generic Verifier

Ivette V\'elez (1), Caleb Rascon (1), Gibr\'an Fuentes-Pineda (1) ((1), Instituto de Investigaciones en Matem\'aticas Aplicadas y en Sistemas, (IIMAS), Universidad Nacional Aut\'onoma de M\'exico (UNAM), Mexico.)

PDF

Open Access 1 Repo

TL;DR

This paper introduces a CNN-based verification system enabling service robots to identify new users through one-shot learning without retraining, by verifying speech against an external database, suitable for real-world applications.

Contribution

A novel Siamese CNN architecture for speaker verification that allows one-shot identification without retraining, suitable for dynamic service environments.

Findings

01

High verification accuracy demonstrated in experiments

02

Effective in real-life noisy environments

03

Fast identification process suitable for real-time applications

Abstract

In service robotics, there is an interest to identify the user by voice alone. However, in application scenarios where a service robot acts as a waiter or a store clerk, new users are expected to enter the environment frequently. Typically, speaker identification models need to be retrained when this occurs, which can take an impractical amount of time. In this paper, a new approach for speaker identification through verification has been developed using a Siamese Convolutional Neural Network architecture (SCNN), where it learns to generically verify if two audio signals are from the same speaker. By having an external database of recorded audio of the users, identification is carried out by verifying the speech input with each of its entries. If new users are encountered, it is only required to add their recorded audio to the external database to be able to be identified, without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trungha-ngx/One-Shot-Speaker-Identification
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Infant Health and Development