Who is Authentic Speaker

Qiang Huang

arXiv:2405.00248·cs.SD·May 2, 2024

Who is Authentic Speaker

Qiang Huang

PDF

Open Access

TL;DR

This paper investigates the challenge of identifying authentic speakers from voice-converted outputs using a deep learning-based recognition system, demonstrating promising results despite the acoustic alterations introduced by voice conversion.

Contribution

It introduces a hierarchical VLAD-based deep neural network model for robust speaker recognition from converted voices, addressing a key challenge in speaker verification.

Findings

01

High recognition accuracy on converted voices

02

Robustness against voice quality variations

03

Effective use of hierarchical VLAD in DNNs

Abstract

Voice conversion (VC) using deep learning technologies can now generate high quality one-to-many voices and thus has been used in some practical application fields, such as entertainment and healthcare. However, voice conversion can pose potential social issues when manipulated voices are employed for deceptive purposes. Moreover, it is a big challenge to find who are real speakers from the converted voices as the acoustic characteristics of source speakers are changed greatly. In this paper we attempt to explore the feasibility of identifying authentic speakers from converted voices. This study is conducted with the assumption that certain information from the source speakers persists, even when their voices undergo conversion into different target voices. Therefore our experiments are geared towards recognising the source speakers given the converted voices, which are generated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · AI in Service Interactions