A Machine of Few Words -- Interactive Speaker Recognition with   Reinforcement Learning

Mathieu Seurin; Florian Strub; Philippe Preux; and Olivier Pietquin

arXiv:2008.03127·eess.AS·August 10, 2020

A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

Mathieu Seurin, Florian Strub, Philippe Preux, and Olivier Pietquin

PDF

TL;DR

This paper introduces Interactive Speaker Recognition, a novel approach that uses reinforcement learning to incrementally identify speakers by requesting personalized utterances, achieving high accuracy with minimal speech data.

Contribution

The paper proposes a new paradigm for speaker recognition that employs reinforcement learning to actively select utterances, improving efficiency and performance over traditional methods.

Findings

01

Achieves high accuracy with limited speech data

02

Uses reinforcement learning for sequential decision-making in speaker recognition

03

Potential application in speech synthesis systems

Abstract

Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances to be spoken in contrast to the standard text-dependent or text-independent schemes. To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning. Using a standard dataset, we show that our method achieves excellent performance while using little speech signal amounts. This method could also be applied as an utterance selection mechanism for building speech synthesis systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.