DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech   Synthesis

Anurag Chowdhury; Arun Ross; Prabu David

arXiv:2012.05084·cs.SD·February 16, 2021

DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

Anurag Chowdhury, Arun Ross, Prabu David

PDF

1 Repo

TL;DR

DeepTalk introduces a novel prosody encoding network that captures vocal style features from raw audio, enhancing speaker recognition accuracy and improving speech synthesis quality by modeling F0 contours.

Contribution

The paper presents DeepTalk, a new method for extracting vocal style features directly from raw audio, outperforming existing systems and integrating into speech synthesis for more natural synthetic speech.

Findings

01

DeepTalk outperforms state-of-the-art speaker recognition systems.

02

Combining DeepTalk with physiological features further improves recognition accuracy.

03

DeepTalk captures F0 contours crucial for vocal style modeling.

Abstract

Automatic speaker recognition algorithms typically characterize speech audio using short-term spectral features that encode the physiological and anatomical aspects of speech production. Such algorithms do not fully capitalize on speaker-dependent characteristics present in behavioral speech features. In this work, we propose a prosody encoding network called DeepTalk for extracting vocal style features directly from raw audio data. The DeepTalk method outperforms several state-of-the-art speaker recognition systems across multiple challenging datasets. The speaker recognition performance is further improved by combining DeepTalk with a state-of-the-art physiological speech feature-based speaker recognition system. We also integrate DeepTalk into a current state-of-the-art speech synthesizer to generate synthetic speech. A detailed analysis of the synthetic speech shows that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iPRoBe-lab/DeepTalk
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.